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SURFACE EXPRESSION LIBRARIES 
OF RANDOMIZED PEPTIDES 



5 Raryr,ROlIND 0 ^ TNVE?JTIQN 

This invention relates generally to methods for 
synthesizing and expressing oligonucleotides and, more 
particularly, to methods for expressing oligonucleotxdes 
having random codon sequences. 

10 Oligonucleotide synthesis proceeds via linear coupling 

of individual monomers in a stepwise reaction. The 
reactions are generally performed on a solid phase support 
by first coupling the 3' end of the first monomer to the 
support. The second monomer is added to the 5' end of the 

15 first monomer in a condensation reaction to yield a 
dinucleotide coupled to the solid support. At the end of 
each coupling reaction, the by-products and unreacted, free 
monomers are washed away so that the starting material for 
the next round of synthesis is the pure oligonucleotide 

20 attached to the support. In this reaction scheme, the 
stepwise addition of individual monomers to a single, 
growing end of a oligonucleotide ensures accurate synthesis 
of the desired sequence. Moreover, unwanted side reactions 
are eliminated, such as the condensation of f.-o 

25 oligonucleotides, resulting in high product yields. 

in some instances, it is desired that synthetic 
oligonucleotides have random nucleotide sequences. This 
result can be accomplished by adding equal proportions of 
all four nucleotides in the monomer coupling reactions, 
30 leading to the random incorporation of all nucleotides and 
yielding a population of oligonucleotides with random 



sequences. Since all possible combinations of nucleotide 
sequences are represented within the population, all 
possible codon triplets will also be represented. If the 
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objective is ultimately to generate rando. pep.-e 
products, this approach has a severe limitation ^ 
'random codons synthesized will bias the — ^^"^^^ 
■ incorporated during translation of the Di.'A by the ca.. 

5 polypeptides. 

The bias is due to the redundancy of the genetic code. 
There are four nucleotide monor.ers which leads to sxxty- 
four possible triplet codons. With only twenty a^ino acxds 
to specify, .any of the a.ino acids are encoded by .ultxp 
codons. Therefore, a population of oligonucleotides 
synthesized by sequential addition of monomers from a 
random population will not encode peptides whose amxno acxd 
sequence represents all possible combinations of the twenty 
different amino acids in equal proportions. That is, the 
frequencv of a^ino acids incorporated into polypeptides 
will be biased toward those amino acids which are specified 
by multiple codons. 



10 



15 



TO alleviate ainino acid bias due to the redundancy of 
the genetic code, the oligonucleotides can be synthesized 
20 froB nucleotide triplets. Here, a triplet coding for each 
of the twenty a-ino acids is synthesized froB individual 
Bonomars. Once synthesized, the triplets are used the 
ooupling reactions instead of individual monomers. By 
mixing e^al proportions of the triplets, °^ 
25 Oligonucleotides with random codons can ba accomplxshed 
However, the cost of synthesis from such triplets far 
exceeds that of synthesis from individual monomers because 
triplets are not commercially available. 

«.ino acid bias can be reduced, however, by 
30 synthesizing the degenerate codon seguence «HK "^-J » " 
a mixture of all four nucleotides and K is a mixture 
guanine and thymine nucleotides. Each position within an 
oligonucleotide having this codon se^ence will ^ 
total of 32 codons (12 encoding amino acids being 
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represented once, 5 represented twice, 3 represented three 
times and one codon being a stop oodon) . Oligonucleotides 
expressed with such degenerate codon sequences will produce 
peptide products whose sequences are biased toward those 
5 amino acids being represented more than once. Thus, 
populations of peptides whose sequences are completely 
random cannot be obtained from oligonucleotides synthesized 
from degenerate sequences. 

There thus exists a need for a method to express 
10 oligonucleotides having a fully random or desirably biased 
sequence which alleviates genetic redundancy. The present 
invention satisfies these needs and provides additional 
advantages as well. 

.qTTMMAPY OF THE I>rVEHTION 

15 The invention provides a plurality of procaryotic 

cells containing a diverse population of expressible 
oligonucleotides operationally linked to expression 
elements, the expressible oligonucleotides having a 
desirable bias of random codon sequences. 

20 RRTEF DESCRIPTION Of THE DRAWINGS 

Figure 1 is a schematic drawing for synthesizing 
oligonucleotides from nucleotide monomers with random 
tuplets at each position using twenty reaction vessels. 

Figure 2 is a schematic drawing for synthesizing 
25 oligonucleotides from nucleotide monomers with random 
tuplets at each position using ten reaction vessels. 

Figure 3 is a schematic diagram of the two vectors 
used for sublibrary and library production from precursor 
oligonucleotide portions, M13IX22 (Figure 3 A) is the 
30 vector used to clone the anti-sense precursor portions 
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«nd relevant restriction sites are 
5 selection and reieva ^^^^ 

Mi-^TX42 (Figure 3B) is the vector usea -co . 
M13IX42 ^rxy Thick lines represent the 

precursor portions (open box) . Thick P 

. „^ir. tvoe gVIII) and wild type (gVIII) gene vii- 
a iL-heiaaa arrow represents t.e portion 
':Ti3IX« Which is to .e =o».inea wit. Mi3IX2a. The tvo . 
:ler t P ooaons ana relevant restriction sites are a so 
IZ Figure 3C S.OWS the joining of veotor population 
Hl i^arles to form the functional surface expression 
from sublibraries i-u i.^^^^ cnrface 
^or- M13IX Figure 3D shows the generation of a surface 
vector MlJXA. rxy^ <^4->.;^in and the 

egression library in ^ ^^^^l^"^ Ze.. a 
promotion of phage The Pha.e are .s ^^^^^^^^^ 
suppressor strain (Figure 3E) for 
screening 'of the library. 

owo^atic diagram of the vector used for 
Tirru-rc 4 is a schematic a.±tx^^°^ 

\ surface expression libraries from random 
20 generation of surface exp 

oligonucleotide populations (M13IX30) . Th 

described for Figure 3. 

Pi^re 5 is t.e nucleotiae se^ence of K131X4. (SEQ ID 

NO: 1) - 

ngure e is the nucleotide se^ence of H131X22 (SEO ID 
NO: 2) . 

Pi^re 7 is the nucleotiae se^enee of K13IX30 (SEQ ID 
NO: 3). 

.igure B is the nucleotiae se^ence of H13ED03 (SBQ ID 
30 NO: 4) . 

Pi^e S is. the nucleotiae sec^ence of H13IX4.1 (SXO 
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ID NO: 5) . 

Figure 10 is the nucleotide sequence of M13ED04 (SEQ 
ID NO: 6) . 

HFTATLED DES ^F T^^^>^ ^"^ TNVENTION 

5 This invention is directed to a simple and inexpensive 

method for synthesizing and expressing oligonucleotides 
having a desirable bias of random codons using individual 
™nnQTnprs= The method is advantageous in that individual 
monomers are used instead of triplets and by synthesizing 

10 only a non-degenerate subset of all triplets, codon 
redundancy is alleviated. Thus, the oligonucleotides 
synthesized represent a large proportion of possible random 
triplet sequences which can be obtained. The 
oligonucleotides can be expressed, for example, on the 

15 surface of filamentous bacteriophage in a form which does 
not alter phage viability or impose biological selections 
against certain peptide sequences. The oligonucleotides 
produced are therefore useful for generating an unlimited 
number of pharmacological and research products. 

20 In one embodiment, the invention entails the 

sequential coupling of monomers to produce oligonucleotides 
with a desirable bias of random codons. 1!he coupling 
reactions for the randomization of twenty codons which 
specify the amino acids of the genetic code are performed 
in ten different reaction vessels. Each reaction vessel 
contains a support on which the monomers for two different 
codons are coupled in three sequential reactions. One of 
the reactions couples an equal mixture of two monomers such 
that the final product has two different codon sequences. 
The codons are randomized by removing the supports from the 
reaction vessels and mixing them to produce a single batch 
of supports containing all twenty codons at a particular 
position, synthesis at the next codon position proceeds by 



25 



30 
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15 



20 



25 



30 



equally dividing the mixed batch of supports into ten 
reaction vessels as before and sequentially couplxng the 
monomers for each pair of codons. The support: are agaxn 
mixed to randomize the codons at the position just 
synthesized. The cycle of coupling, mixing and dividing 
continues until the desired number of codon positions have 
been randomized. After the last position has been 
randomized, the oligonucleotides with random codons are 
cleaved from the support. The random oligonucleotides can 
then be expressed, for example, on the surface of 
filamentous bacteriophage as gene Vlll-peptide fusion 
proteins. Alternative genes can be used as welx. 

In its broadest form, the invention provides a diverse 
population of synthetic oligonucleotides contained in 
vectors so as to be expressible in cells. Such populations 
of diverse oligonucleotides can be fully random at one or 
more codon- sites or can be fully defined at one or more 
site, so long as at least one site the codons are randomly 
variable. The populations of oligonucleotides can be 
expressed as fusion products in combination with surface 
proteins of filamentous bacteriophage, such as M13 , as with 
gene VIII. The vectors can be transfected into a plurality 
of cells, such as the procaryote Tl. cgj-i . 

The diverse population of oligonucleotides can be 
formed by randomly combining first and second precursor 
populations, each precursor population having a desirable 
bias of random codon sequences. Methods of synthesizing 
and expressing the diverse population of expressible 
oligonucleotides are also provided. 

in a preferred embodiment, two populations of random 
oligonucleotides are synthesized. The oligonucleotides 
within each population encode a portion of the final 
oligonucleotide which is to be expressed. Oligonucleotides 
within one population encode the carboxy terminal portion 
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35 



of the expressed oligonucleotides. These oligonucleotides 
are cloned in franie with a gene VIII (gVIII) sequence so 
that translation of the sequence produces peptide fusion 
proteins. The seconu population of oligonucleotides are 
cloned into a separate vector. Fach oligonucleotide within 
this population encodes the anti-sense of the ai=ino 
terminal portion of the expressed oligonucleotides, Tnis 
vector also contains the elements necessary for expression. 
The two vectors containing the random oligonucleotides are 
combined such that the two precursor oligonucleotide 
portions are ioined together at random to form a population 
of larger oligonucleotides derived from two smaller 
portions. The vectors contain selectable markers to ensure 
niaximum efficiency in joining together the two 
oligonucleotide populations. A mechanism also exists ro 
control the expression of gVIII-peptide fusion proteins 
during library construction and screening. 

AS used herein, the term "monomer" or "nucleotide 
monomer" refers to individual nucleotides used in the 
chemical synthesis of oligonucleotides. Monomers that can 
be used include both the ribo- and deoxyribo- forms of each 
of the five standard nucleotides (derived from the bases 
adenine (A or dA, respectively), guanine (G or dG) , 
cytosine (C or dC) , thymine (T) and uracil (U)) 
Derivatives and precursors of bases such as inosine which 
are capable of supporting polypeptide biosynthesis are also 
included as monomers. Also included are chemically 
modified nucleotides, for example, one having a reversible 
blocking agent attached to any of the positions on the 
purine or pyrimidine bases, the ribose or deoxyribose sugar 
or the phosphate or hydroxyl moieties of the monomer. . Such 
blocking groups include, for example, dimethoxytrityl , 
benzoyl, isobutyryl, beta-cyanoethyl and diisopropylamine 
groups, and are used to protect hydroxyls, exocyclic amines 
and phosphate moieties. Other blocking agents can also be 
used and are known to one skilled in the art. 
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AS used herein, the tenn "tuplef ■ refers to a group o£ 
ele-nents of a definable size. The elements of a tuplet as 
used herein are nucleotide monomers. For example, a tuplet 
■ can be a dinuoleotide, a trinucleotide or can also oe rour 

5 or more nucleotides. 

*s used herein, the terr, •■codon" or "triplef refers 
to a tuplet consisting of three adjacent nucleot.de 
„rs Which specify one of the twenty natura ly 
occurring amino acids found in polypeptide brosynthesrs . 
10 The term also includes nonsense, or stop, codons which do 
not specify any amino acid. 

"Random codons" or "randomized codons," as used 

herein, refers to more than one codon at a position within 

a collection of oligonucleotides. The number of different 

X5 codons can be from two to twenty at any particular 

oosition. ■ "Randomized oligonucleotides," as used herein, 

refers to a collection of oligonucleotides with random 

codons at one or more positions. "Random codon sequences" 

as used herein means that more than one codon position 

20 within a randomized oligonucleotide contains random codons. 

For example, if randomized oligonucleotides are six 

nucleotides in length (i.e., two codons) and both the first 

and second codon positions are randomized to encode all 

twenty amino acids, then a population of oligonucleotides 

25 having random codon sequences with every possible 

combiLtion of the twenty triplets in the first ahd second 

position maKes up the above population of randomized 

oligonucleotides. The number of possible codon 

combinations is 20^ Lilcewise, if randomized 

30 Oligonucleotides of fifteen nucleotides in length are 

synthesized which have random codon sequences at all 

^ all twenty amino acids, then ax J. 
positions encodxng all twern^y <xm 

triplets coding for each of the twenty amino acxds will be 
found in equal proportions at every position. The 
35 copulation constituting the randomized oligonucleotides 
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will, contain 2o'^ different possible species of 
oligonucleotides. "Random tuplets, " or "randoruzed 
• tuplets" are defined analogously. 

AS used herein, the term "bias" refers to a 
preference. It is understood that there can be degrees of 
preference or bias toward codon sequences which encode 
particular amino acids. For example, an oligonucleotxde 
whose codon sequences do not preferably encode particular 
amino acids is unbiased and therefore completely random - 
The oligonucleotide codon sequences can also be biased 
toward predetermined codon sequences or codon frequencies 
and while still diverse and random, will exhibit codon 
sequences biased toward a defined, or preferred, sequence. 
"A desirable bias of random codon sequences" as used 
herein, refers to the predetermined degree of bias which 
can be selected from totally random to essentially, but not 
totally, defined (or prefe-red) . There must be at least 
one codon position which is variable, however. 

As used herein, the term "support" refers to a solid 
phase material for attaching monomers for chemical 
synthesis. Such support is usually composed of materials 
such as beads of control pore glass but can be other 
materials known to one skilled in the art. The term is 
also meant to include one or more monomers coupled to the 
support for additional oligonucleotide synthesis reactions. 

AS used herein, the terms "coupling" or "condensing" 
refers to the chemical reactions for attaching one monomer 
to a second monomer or to a solid support. Such reactions 
are known to one skilled in the art and are typically 
performed on an automated DNA synthesizer such as a 
MilliGen/Biosearch Cyclone Plus Synthesizer using 
procedures recommended by the manufacturer. "Sequentially 
coupling" as used herein, refers to the stepwise addition 
of monomers. 
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A method of synthesizing oligonucleotides having 
random tuplets using individual monomers is described. The 
method consists of several steps, the first being synthesis 
• of a nucleotide tuplet for each tuplet to be randomized. 
5 AS described here and below, a nucleotide triplet (i.e., a 
codon) will be used as a specific example of a tuplet. Any 
size tuplet will work using the methods disclosed herein, 
and one skilled in the art would know how to use the 
methods to randomize tuplets of any size. 

10 If the randomization of codons specifying all twenty 

amino acids is desired at a position, then twenty different 
codons are synthesized. Likewise, if randomization of only 
ten codons at a particular position is desired then those 
ten codons are synthesized. Randomization of codons from 
15 two to sixty-four can be accomplished by synthesizing each 
desired triplet. Preferably, randomization of from two to 
twenty codons is used for any one position because of the 
redundancy of the genetic code. The codons selected at one 
position do not have to be the same codons selected at the 
20 next position. Additionally, the sense or anti-sense 
sequence oligonucleotide can be synthesized. The process 
therefore provides for randomization of any desired codon 
position with any number of codons. 

codons to be randomized are synthesized sequentially 
25 by coupling the first monomer of each codon to separate 
supports. The supports for the synthesis of each codon 
can, for example, be contained in different reaction 
vessels such that one reaction vessel corresponds to the 
monomer coupling reactions for one codon. As will be used 
3 0 here and below, if twenty codons are to be randomized, then 
twenty reaction vessels can be used in independent coupling 
reactions for the first twenty monomers of each codon. 
synthesis proceeds by sequentially coupling the second 
monomer of each codon to the first monomer to produce a 
35 dimer, followed by coupling the third monomer for each 
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codon to each of the above-sy.thesized di.ers to produce a 
triraer (Figure 1. step 1. where M„ and M3 represent the 
. first, second and third monomer, respectively, for each 
couon to be randomized) . 

Following synthesis of the first codons . from 
individual monomers, the randomization is achieved by 
nixing the supports from all twenty reaction vessels whxch 
contain the individual codons to be randomized. The solxd 
phase support can be removed from its vessel and mxxed to 
10 achieve a random distribution of all codon species wathxn 
the population (Figure 1, step 2). The mixed population of 
supports, constituting all codon species, are then 
redistributed into twenty independent reaction vessels 
(Figure 1, step 3). The resultant vessels are all 
identical and contain equal portions of all twenty codons 
coupled to. a solid phase support. 

For randomization of the second position codon, 
synthesis of twenty additional codons is performed m each 
of the twenty reaction vessels produced in step 3 as the 
condensing substrates of step 1 (Figure 1, step 4). Steps 
1 and 4 are therefore equivalent except that step 4 uses 
the supports produced by the previous synthesis cycle 
(Steps 1 through 3) for codon synthesis whereas step 1 xs 
the initial synthesis of the first codon in the 
oligonucleotide. The supports resulting from step 4 wxll 
each have two codons attached to them (x.e a 
hexanucleotide) with the codon at the first posxtion bexng 
any one of twenty possible codons (i.e., random) and the 
coLn at the second position being one of the twenty 
30 possible codons. 

For randomization of the codon at the second position 
and synthesis of the third position oodon, steps 2 through 
4 are again repeated. Thi= process yields in each vessel 
a three codon oligonucleotide (i.e., 9 nucleotides) wrth 
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codon positions 1 and 2 randomized and position three 
containing one of the twenty possible codons. Steps 2 
through 4 are repeated to randomize the third position 
codon and synthesize the codon at the next position. The 
5 process is continued until an oligonucleotide of the 
desired length is achieved. After the final randomization 
step, the oligonucleotide can be cleaved from the supports 
and isolated by methods known to one skilled in the art. 
Alternatively, the oligonucleotides can remain on the 
10 supports for use in methods employing probe hybridization. 

The diversity of codon sequences, i.e., the number of 
different possible oligonucleotides, which can be obtained 
using the methods of the present invention, is extremely 
large and only limited by the physical characteristics of 
15 available materials. For example, a support composed of 
beads of about 100 fin in diameter will be limited to about 
10,000 beads/reaction vessel using a 1 reaction vessel 
containing 25 mg of beads. This size bead can support 
about 1 X 10^ oligonucleotides per bead. Synthesis using 
20 separate reaction vessels for each of the twenty amino 
acids will produce beads in which all the oligonucleotides 
attached to an individual bead are identical. The 
diversity which can be obtained under these conditions is 
approximately lo' copies of 10,000 x 20 or 200,000 different 
25 random oligonucleotides. The diversity can be increased, 
however, in several ways without departing from the basic 
methods disclosed herein. For example, the number of 
possible sequences can be increased by decreasing the size 
of the individual beads which make up the support. A bead 
30 of about 30 in diameter will increase the number of 
beads per reaction vessel and therefore the number of 
oligonucleotides synthesized. Another way to increase the 
diversity of oligonucleotides with random codons is to 
increase the volume of the reaction vessel. For example, 
35 using the same size bead, a larger volume can contain a 
greater number of beads than a smaller vessel and therefore 
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support the synthesis of a greater number of 
oligonucleotides. Increasing the n'onber of codons coupled 
. to a support in a single reaction vessel also increases the 
diversity of the random oligonucleotides. The total 
5 diversity will be the number of codons coupled per vessel 
raised to the number of codon positions synthesized. For 
example, using ten reaction vessels, each synthesizing two 
codons to randomize a total of twenty codons, the number of 
different oligonucleotides of ten codons in length per 100 
10 tiia bead can be increased where each bead will contain about 
•5^° or 1 X lo' different sequences instead of one. One 
skilled in the art will know how to modify such parameters 
to increase the diversity of oligonucleotides with random 
codons . 

15 A method of synthesizing oligonucleotides having 

random codons at each position using individual monomers 
wherein the number of reaction vessels is less than the 
number of codons to be randomized is also described. For 
example, if twenty codons are to be randomized at each 

20 position within an oligonucleotide population, then ten 
reaction vessels can be used. The use of a smaller number 
of reaction vessels than the number of codons to be 
randomized at each position is preferred because the 
smaller number of reaction vessels is easier to manipulate 

25 and results in a greater number of possible 
oligonucleotides synthesized. 

The use of a smaller number of reaction vessels for 
random synthesis of twenty codons at a desired position 
within an oligonucleotide is similar to that described 

30 above using twenty reaction vessels except that each 
reaction vessel can contain the synthesis products of more 
than one codon. For example, step one synthesis using ten 
reaction vessels proceeds by coupling about two different 
codons on supports contained in each of ten reaction 

35 vessels. This is shown in Figure 2 where each of the two 
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to a different support can consist of the 
codons coupled to a d 

following „ , ^/c,M for Tyr and His; (4) 

(I/C)Cr for ser and Pro, P / ^ 

-'CGI for cys and Arg; (5) (C/A)it lor 

ivC)GT lor , (A/G)CT for Thr and Ala, (8) 

,C/G)AG for =ln and Glu ,or Trp and Gly and 

,VG,AT for Asn and Asp,J9 ( / ) ^^^^ ^^^^^^^^^ 

(10) A(T/A)A for lie and ^^^^ ^^^^ of the 

. .i=^ure Of the .oncer -^--^^ ^^^^^^^ 

3lash are used as ^' J^^^J^^^^^^^,, 3e,uenoe for each 
indicated coupling step. Jl^^ synthesizing the 

the a.ove codons ^ .e gen^at J 
oo»ple»enta^ seguence. Por ^.^^^ ^^^^^^ 

::rr:;::rroTLeguences are given as the standard 
three letter nomenclature. 

Una of the monomers in this fashion will yield 
coupxxng of the naturally occurring 

codons specifying all ^^^"^^ vessels, 
amino acids attached to suppo vessels to be 

However, the nuB.er of individual react-n ^^^^^ 
.sed Will depend on the nu^er ^^^^^^^^^ 

"t'rr r er^P- i. ten codons are to he 

do::rd,Thernve reaction 

-r.. The codon sequences gxven above can t>e u 
coupling. The cod 4 sequences of the codons can 

3 this synthesis as well. The sequ 



code. 



so 



35 



.he re.aining .teps of s^^^^ ^^^^ 
„ith ranao. -^Vove^^ .^thesis with twenty 

vessels are - -^^^ ,,4 dividing steps 

reaction vessels except ima.. nuj4,er of 

performed with supports ^ i„ 

1 * These remaining sT^eps 
reaction vessels. These 

Figure 2 (steps 2 through 4) . 
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Oligonucleotides having at least one specified tuplet 
at a predetermined position and the remaining positions 
having random tuplets can also be synthesized using the 
methods described herein. The synthesis steps are simxlar 
5 to those outlined above using twenty or less reaction 
vessels except that prior to synthesis of the specified 
codon position, the dividing of the supports into separate 
reaction vessels for synthesis of different codons is 
omitted. For example, if the codon at the second position 
10 of the oligonucleotide is to be specified, then following 
synthesis of random codons at the first position and mixing 
of the supports, the mixed supports are not divided into 
new reaction vessels but, instead, can be contained in a 
single reaction vessel to synthesize the specified codon. 
15 The specified codon is synthesized sequentially fros 
individual monomers as described above. Thus, the number 
of reaction vessels can be increased or decreased at each 
step to allow for the synthesis of a specified codon or a 
desired number of random codons. 

20 Following codon synthesis, the mixed supports are 

divided into individual reaction vessels for synthesis of 
the next codon to be randomized (Figure 1, step 3) or can 
be used without separation for synthesis of a consecutive 
specified codon. The rounds of synthesis can be repeated 

25 for each codon to be added until the desired number of 
positions with predetermined or randomized codons are 
obtained . 

synthesis of oligonucleotides with the first position 
codon being specified can also be synthesized using the 

30 above method. In this case, the first position codon is 
synthesized from the appropriate monomers. The supports 
are divided into the required number of reaction vessels 
needed for synthesis of random codons at the second 
position and the rounds of synthesis, mixing and dividing 

35 are performed as described above. 
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A method of synthesizing oligonucleotides having 
tuplets which are diverse but biased toward a predetermined 
sequence is also described herein. This method employs two 
'reaction vessels, one vessel for the synthesis of a 
5 predetermined sequence and the second vessel for the 
synthesis of a random sequence. This method is 
advantageous to use when a significant number of codon 
positions, for example, are to be of a specified sequence 
since it alleviates the use of multiple reaction vessels. 
10 instead, a mixture of four different monomers such as 
adenine, guanine, cytosine and thymine nucleotides are used 
for the first and second monomers in the codon. The codon 
is completed by coupling a mixture of a pair of monomers of 
either guanine and thymine or cytosine and adenine 
15 nucleotides at the third monoEer position. In the second 
vessel, nucleotide monomers are coupled sequentially to 
yield the predetermined codon sequence. Mixing of the two 
supports yields a population of oligonucleotides containing 
both the predetermined codon and the random codons at the 
20 desired position. Synthesis can proceed by using this 
mixture of supports in a single reaction vessel, for 
example, for coupling additional predetermined codons or, 
further dividing the mixture into two reaction vessels for 
synthesis of additional random codons. 

25 The two reaction vessel method can be used for codon 

synthesis within an oligonucleotide with a predetermined 
tuplet sequence by dividing the support mixture into two 
portions at the desired codon position to be randomized. 
Additionally, this method allows for the extent of 
30 randomization to be adjusted. For example, unequal mixing 
or dividing of the two supports will change the fraction of 
codons with predetermined sequences compared to those with 
random codons at the desired position. Unequal mixing and 
dividing of supports can be useful when there is a need to 
35 synthesize random codons at a significant number of 
positions within an oligonucleotide of a longer or shorter 
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length. 

The extent of randomization can also be adjusted by 
asing unequal mi==tares of monon,ers in the fir.t, seoona ana 
third Bono-er coupling steps of the random oodon posxtion. 
The unequal mixtures can be in any or all of the coupling 
steps to yield a population of codons enriched in sequences 
reflective of the monomer proportions. 

synthesis of randomized oligonucleotides is performed 
-=^H„d5 well XnowT, to one skilled in the art. Linear 
^^^li^g'ot monomers can, for example, be accomplished 
using phosphoramidite chemistry with a HilliGen/Bxosearch 
cyclone Plus automated synthesiser as described by the 
manufacturer (Hillipore, Burlington, MA). Other 
chemistries and automated synthesizers can be employed as 
well and are tooun to one sldlled in the art. 

synthesis of multiple codons can be performed without 
modification to the synthesizer by separately synthesizing 
the codons in individual sets of reactions. Alternatively, 
modification of an automated DNA synthesizer can be 
performed for the simultaneous synthesis of codons m 
multiple reaction vessels. 

in one embodiment, the invention provides a plurality 
Of procaryotic cells containing a diverse population of 
expressible oligonucleotides operationally linked to 

25 egression elements. the expressible oligonucleotides 
h^ing a desirable bias of random codon sequences produced 
from diverse combinations of first and second 
Oligonucleotides having a desirable bias °^ 
sequences. The invention provides for a method or 

30 constructing such a plurality of procaryotic cells as well. 

The oligonucleotides synthesized by the above methods 
can be used to express a plurality of random peptides which 
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are unbiased, diverse but biased toward a predetermined 
sequence or which contain at least one specified codon at 
a predetermined position. The need will determine which 
■ type of oligonucleotide is to be expressed to give tne 
5 resultant population of random peptides and is known to one 
skilled in the art. Expression can be performed in any 
compatible vector/host system. Such systems include, for 
example, plasmids or phagemids in procaryotes such as E. 
goli, yeast systems, and other eucaryotic systems such as 

10 mammalian cells, but will be described herein in context 
with its presently preferred embodiment, i.e. expression on 
the surface of filamentous bacteriophage. Filamentous 
bacteriophage can be, for example, M13, fl and fd. Such 
phage have circular single-stranded genomes and double 

15 strand replicative DNA forms. Additionally, the peptides 
can also be expressed in soluble or secreted form depending 
on the need and the vector/host system employed. 

Expression of random peptides on the surface of M13 
can be accomplished, for example, using the vector system 
20 shown in Figure 3. Construction of the vectors enabling 
one of ordinary skill to make them are explicitly set out 
in Examples I and II. " The complete nucleotide sequences 
are given in Figures 5, 6 and 7 (SEQ ID NOS: 1, 2 and 3, 
respectively). This system produces random 

25 oligonucleotides functionally linked to expression elements 
and to gVIII by combining two smaller oligonucleotide 
portions contained in separate vectors into a single 
vector. The diversity of oligonucleotide species obtained 
by this system or others described herein can be 5 x 10 or 
30 greater. Diversity of less than 5 x 10 can also be 
obtained and will be determined by the need and type of 
random peptides to be expressed. The random combination of 
. two precursor portions into a larger oligonucleotide 
increases the diversity of the population several fold and 
35 has the added advantage of producing oligonucleotides 
larger than what can be synthesized by standard methods. 
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Additionally, although the correlation is not known, when 
the number of possible paths an oligonucleotide can taXe 
during synthesis such as described herein is greater than 
■ the nuBber of beads, then there will be a correlation 
5 between the synthesis path and the sequences obtained. By 
combining oligonucleotide populations which are synthesized 
separately, this correlation will be destroyed. Therefore, 
any bias which may be inherent in the synthesis procedures 
will be alleviated by joining two precursor portions into 
10 a contiguous random oligonucleotide. 

Populations of precursor oligonucleotides to be 
combined into an expressible form are each cloned into 
separate vectors. The two precursor portions which make up 
the combined oligonucleotide corresponds to the carboxy and 
15 amino terminal portions of the expressed peptide. Eacn 
precursor oligonucleotide can encode either the sense or 
anti-sense and will depend on the orientation of the 
expression elements and the gene encoding the fusion 
portion of the protein as well as the mechanism used to 
20 join the two precursor oligonucleotides. For the vectors 
shown in Figure 3, precursor oligonucleotides corresponding 
to the carboxy terminal portion of the peptide encode the 
sense strand. Those corresponding to the amino terminal 
portion encode the anti-sense strand. Oligonucleotide 
25 populations are inserted between the Eco RI and Sac I 
restriction enzyme sites in M13IX22 and M13IX42 (Figure 3A 
and B) . M13IX42 (SEQ ID NO: 1) is the vector used for 
sense strand precursor oligonucleotide portions and M13IX22 
(SEQ ID no: 2) is used for anti-sense precursor portions. 

The populations of randomized oligonucleotides 
inserted into the vectors are synthesized with Eco KI and 
sac I recognition sequences flanking opposite ends of tne 
random codon sequences. The sites allow annealing and 
ligation of these single strand oligonucleotides into a 
3 5 double stranded vector restricted with Eco RI and Sac I. 
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Alternatively, the oligonucleotides can be inserted intc 
the vector by standard mutagenesis methods. In this latter 
method, single stranded vector DNA is isolated from the 
phage and annealed with random oligonucleotides having 
5 known sequences complementary to vector sequences. Tha 
oligonucleotides are extended with DNA polymerase to 
produce double stranded vectors containing the randomized 
oligonucleotides . 

The vector used for sense strand oligonucleotide 
10 portions, M13IX42 (Figure 3B) contains down-stream and in 
frame with the Eco RI and Sac I restriction sites a 
sequence encoding the pseudo-wild type gVIII product. This 
gene encodes the wild type M13 gVIII amino acid sequence 
but has been changed at the nucleotide level to reduce 
15 homologous recombination with the wild type gVIII contained 
on the same vector. The wild type gVIII is present to 
ensure that at least some functional, non-fusion coat 
protein will be produced. The inclusion of a wild type 
gVIII therefore reduces the possibility of non-viable phage 
20 production and biological selection against certain peptide 
fusion proteins. Differential regulation of the two genes 
can also be used to control the relative ratio of the 
pseudo and wild type proteins. 

Also contained downstream and in frame with the Eco RI 
25 and sac I restriction sites is an amber stop codon. The 
mutation is located six codons downstream from Sac I and 
therefore lies between the inserted oligonucleotides ai:d 
the gVIII sequence. As was the function of the wild type 
gVIII, the amber stop codon also reduces biological 
30 selection when combining precursor portions to produce 
expressible oligonucleotides. This is accomplished by 
using a non-suppressor (sup 0) host strain because non- 
suppressor strains will terminate expression after the 
oligonucleotide sequences but before the pseudo gvni 
35 sequences. Therefore, the pseudo gVIII will never be 
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expressed on the ph.ge surface under these circumstances. 
I^tead only soluble peptides will be produced. 
. S ::slon in I non-suppressor strain can be advantageous y 
u^lized When one wishes to produce lar,e P=P"1»"°- ^ 
5 soluble peptides. Stop codons other than a^er such as 
opal ana ochre, or Molecular switches, such as inducible 
repressor elements, can also be used to unl.nX peptide 
Xssion fro. surface expression. Additional controls 
exist as well and are described below. 

^„ a-n-==nse strand oligonucleotide 

portions, M13IX22, (Figure 3A, , contains the expression 
elements for the peptide fusion proteins. Upstream and in 
frame with the Sac I and Eco RI sites in this vector .s a 
leader sequence for surface expression. A riboso-na binding 
15 site and Lac Z promoter/operator elements are present for 
transcription and translation of the peptide fusion 
proteins . 

Both vectors contain a pair ot FoK I restriction 
enzyme sites (Figure 3 A and B) for joining together two 
.0 pressor oligonucleotide portions and their vector 
20 precurs ^ located at the ends of each 

sequences. one site is lui. The 
crecursor oligonucleotide which is to be joined. The 

:::::: fo. . wi-in the vectors i= - - - 

of the vector sequences which are ro oe j 

vr.v T site has been altered to 
25 overhang of this second Fok I sire 

TnLde a sequence which is not found in the overhangs 
produced at the first Fo. I site within the oligonucleotide 
portions. The two sites allow the cleavage of each 
Circular vector into two portions and subsequent ligation 
30 of essential components within each vector into a Slhgle 
" rir^ar vector where the two oligonucleotide precursor 
portions form a contiguous s nee ^^^^^^^^^^^ 
compatible overhangs produced at the two lo 
optLal conditions to be selected for performhg 
3. concatermization or clrcularization reactions for joining 
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the two vector portions. Such selection of conditions can 
be used to govern the reaction order and therefore increase 
the efficiency of joining. 

Fok I is a restriction enzyme whose recognition 
5 sequence is distal to the point of cleavage. Distal 
placement of the recognition sequence in its location to 
the cleavage point is important since if the two were 
superimposed within the oligonucleotide portions to be 
combined, it would lead to an invariant codon sequence at 

10 the juncture. To alleviate the formation of invariant 
codons at the juncture, Fok I recognition sequences can be 
placed outside of the random codon sequence and still be 
used to restrict within the random sequence. Subsequent 
annealing of the single-strand overhangs produced by Fok I 

15 and ligation of the two oligonucleotide precursor portions 
allows the juncture to be formed. A variety of restriction 
enzymes restrict DNA by this mechanism and can be used 
instead of Fok I to join precursor oligonucleotides without 
creating invariant codon sequences. Such enzymes include, 

20 for example, Alw I, Bbu I, Bsp MI, Hga I, Hph I, Mbo II, 
Mnl I, Pie I and Sfa NI. One skilled in the art knows how 
to substitute Fok I recognition sequences for alternative 
enzyme recognition sequences such as those above, and use 
the appropriate enzyme for joining precursor 

25 oligonucleotide portions. 

Although the sequences of the precursor 
oligonucleotides are random and will invariably have 
oligonucleotides within the two precursor populations whose 
sequences are sufficiently complementary to anneal after 

30 cleavage, the efficiency of annealing can be increased by 
insuring that the single-strand overhangs within one 
precursor population will have a complementary sequence 
within the second precursor population. This can be 
accomplished by synthesizing a non-degenerate series of 

3 5 known sequences at the Fok I cleavage site coding for each 
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of the twenty amino acids. Since the Fok I cleavage site 
contains a four base overhang, forty different sequences 
are needed to randomly encode all twenty amino acids. For 
example, if two precursor populations of ten codons in 
5 length are to be combined, then after the ninth codon 
position is synthesized, the mixed population of supports 
are divided into forty reaction vessels for each of the 
populations and complementary sequences for each of the 
corresponding reaction vessels between populations are 
10 independently synthesized. The sequences are shown in 
Tables III and VI of Example I where the oligonucleotides 
on columns IR through 4 OR form complementary overhangs with 
the oligonucleotides on the corresponding columns IL 
through 40L once cleaved. The degenerate X positions in 
15 Table. VI are necessary to maintain the reading frame once 
the precursor oligonucleotide portions are joined. 
However, use of restriction enzymes which produce a blunt 
end, such as Mnl I can be alternatively used in place of 
Fok I to alleviate the degeneracy introduced in maintaining 
20 the reading frame. 

The last feature exhibited by each of the vectors is 
an amber stop codon located in an essential coding sequence 
within the vector portion lost during combining (Figure 
3C) . The amber stop codon is present to select for viable 
25 phage produced from only the proper combination of 
precursor oligonucleotides and their vector sequences into 
a single vector species. Other non-sense mutations or 
selectable markers can work as well. 

The combining step randomly brings together different 
30 precursor oligonucleotides within the two populations into 
a single vector (Figure 3C; M13IX) . The vector sequences 
donated from each independent vector, M13IX22 and M13IX42, 
are necessary for production of viable phage. Also, since 
the expression elements are contained in M13IX22 and the 
35 gVIII sequences are contained in H13IX42, expression of 
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accomplished until the sequences are Imkea 
. M13IX. 

3C, «,y vectors generated which contain an a«ber stop 
: ;n will not produce viahle phage when ^--ducea .nto^ 
noh-suppressor strain (Figure 3D). Therefore, only the 

.0 s^Uces Which do not contain an a^er stop ccdon w.l 
IZ up the final population of vectors contained .n the 
uLry These vector seguences are the seguences reguxred 
for surface expression of randomized peptides. By 
analogous methodology, .ore than two vector portions can be 

„ ToLined into a single vector which expresses randoe 
peptides- 

The invention provides for a .ethod of selecting 
peptides capable of being bound by a Ugand binding protein 
peptiaes oay „„_tidas bv (a) operationally 

from a population of random ^^^'f "J < ;^^^„„,,,,ti^es 
20 linking a diverse population of first oligo 

h Ig a desirable bias of random -'"-"^ J^^/^ 

first vector, (b) operationally lin)cing a/^^"" 
population Of second oligonucleotides having ^ aes-abl 
Las of random codon sequences to a second vector, (c) 
r intg the vector products of steps ^d 
conditions where said populations of first and sec 
Oligonucleotides are Joined together into a popul«ion o 
oolined vectors; (d, introducing said ^^-^^^^^ ^^ 
combined vectors into a compatible host under 
,0 sufficient for expressing said population of random 
peptidesfand (e, Xtermining the peptides which bind to 
Tafd binding protein. The invention also P«>-*- 
determining the encoding nucleic acid se-pience of such 
peptides as well. 
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surface expression c. t.e J-^^/^^/.^,,:: 
cerformed in an amber suppressor strain. As 
'a^ V , the a»ber stop codon between the rando. codon 
■ fe^!;=e and the gVItl se^ence unlin.s the two component 
sequence r.^iatina the phage produced 

■i In a non-suppressor strain. Isolating -ens v ^ 
5 in a non a ft. infecting a suppressor 

troiD the non-suppressor strain and infecting ft- ^ 
strain will link the random codon sequences to the gVIII 
.e^ence during expression (rigure 3E, . Culturing th 
suppressor strain after infection allows the expression of 
10 all peptide species within the library as .Vlll-peptide 
:,...o/protelns. Alternatively, the DKA can he isolated 
Vrol the non-suppressor strain and then introduced into a 
suppressor strain to accomplish the sane effect. 

The level of expression of gVIII-peptide fusion 
15 proteins can additionally be controlled at the 
transcriptional level. The gVIll-peptide fusion protein 
are under the inducible control of the Lac 
promoter/operator syste.. other inducible promoters can 
Lr. as well and are ^o-^n, by one skilled in the art For 
20 high levels of surface expression, the suppressor library 
is cultured m an inducer of the I^c Z promoter such as 
opropylthio-B-galactoside (I^C . Inducible control is 
heneficial because biological selection against nn- 
functional gVIZI-peptide fusion proteins can «a 
25 by culturing the library under non-expressing conditions. 
Expression can then be induced only at the ti»e of 
screening to ensure that the entire P^P-^--^- 
oligonucleotides within the library are ™^^^^ 
represented on the phage surface. Also this can be used to 
30 control the valency of the peptide on the phage surface. 

The surface expression library is screened for 
specific peptides which bind ligand binding proteins by 
stanaa d affinity isolation procedures. such methods 
include, for example, panning, affinity chromatography an 
35 solid phase blotting procedures. Panning as described by 
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Parmley and S^ith, Gene 73:305-318 (1988), which xs 
incorporated herein by reference, is preferred because high 
titers of phage can be screened easily, quickly and xn 
• sBall volu.es. Furthermore, this procedure can sexec. 
5 .inor peptide species within the P^P"^^^--; ^^^^ 
otherwise would have been urdetectable, and amplifxed to 
substantially homogenous populations. The selected peptxde 
sequences can be detem^ined by sequencing the nuclexc acxd 
encoding such peptides after amplification of the phage 
10 population. 

The invention provides a plurality of procaryotic 
cells containing a diverse population of oligonucleotides 
having a desirable bias of random codon se^ences that are 
operationally linked to expression sequences. The 
15 invention provides for methods of constructing such 
populations of cells as well. 

Random oligonucleotides synthesized by any of the 
^nethods described previously can also be expressed on the 
surface of filamentous bacteriophage, such as M13, .or 

20 example, without the joining together of precursor 
oligonucleotides. A vector such as that shown in 
M13IX30, can be used. This vector exhibits all the 
functional features of the combined vector shown in Fxgure 
3C for surface expression of gVIII-peptide fusion protexns. 

25 The complete nucleotide sequence for M13IX30 (SEQ ID NO: 3) 
is shown in Figure 7, 

H13IX30 contains a wild type gVIII for phage viability 
and a pseudo gVIII sequence for peptide fusions The 
vector also contains in fra.e restriction sites for cloning 
30 random peptides. The cloning sites in this vector are ^o 
I Stu I and spe I. Oligonucleotides should therefore be 
synthesized with the appropriate complementary ends for 
annealing and ligation or insertional -^agenesis 
Alternatively, the appropriate termini can be generated by 
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PCR technology. Between the restriction sites and the 
pseudo gVIII sequence is an in-frame anber stop codon, 
again, ensuring complete viability of phage in constructing 
and manipulating the library. Expression and screening is 
5 performed as described above for the surface expression 
library of oligonucleotides generated from precursor 
portions. 

Thus, the invention provides a method of selecting 
peptides capable of being bound by a ligand binding protein 

10 from a population of random peptides by (a) operationally 
linking a diverse population of oligonucleotides having a 
desirable bias of random codon sequences to expression 
elements; (b) introducing said population of vectors into 
a compatible host under conditions sufficient for 

15 expressing said population of random peptides; and (c) 
determining the peptides which bind to said binding 
protein. Also provided is a method for determining the 
encoding nucleic acid sequence of such selected peptides. 



20 



The following examples are intended to illustrate, but 
not limit the invention. 



EXAMPLE I 



Ter^n^-Hinn and Cha r;^r.terizatio n of Pentidp T.irrRnds Generated 
vr-nm Right a nH i^f^ }^^-}f Random Oligonucleotides 

25 This example shows the synthesis of random 

oligonucleotides and the construction and expression of 
surface expression libraries of the encoded randomized 
peptides. The random peptides of this example derive from 
the mixing and joining together of two random 

30 oligonucleotides. Also demonstrated is the isolation and 
characterization of peptide ligands and their corresponding 
nucleotide sequence for specific binding proteins. 
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Synthssi^_ofJBand^^ 

■ ...«.pona to scalier portions a ar, ^^^^^^^ 
oligonucleotide is ^^o-^^^- .U,onu=leotide. 

. >4 -r-idht and left half. £acn 

each half are designated -9" ^^^^^^ 

population Of right and left ^^'^ position. The 

length with twenty randoB codons at each po 

.J. sense sequence of the 

right half corresponds to the J ^^^^,1 ■ 

rand..i.ed oligonucleotides and encode the 
..If Of the expressed peptides. The left half P^^^^ 
to the anti-sense sequence ot tne 
riigonuoleotides and encode the a.ino -----^ ^ ^e 

expressed peptides. The 'f'JZl.ea into 

randomized oligonucleotide populations are clon 

Lparate Vector species and then -^0^ 
the right and left halves co^e togetner in 

Lination to produce a single expression vector species 
fc To laiL r population of rando.i.ed 
.„entv codons in length. --troporation of the v^or 
population into an appropriate host 

ph.ge Which express the random peptides on their surface. 

The reaction vessels for oligonucleotide synthesis 
were ohtained fro. the manufacturer of - J--^ 
synthesizer (Hillipore. Burlington HA, 
HUliOen/Biosearch cyclone Plus Synt^-zer^ The ve 
were supplied as packages containing ^^^^J^ ^^oh 
(1 ,mole), frits, crimps -^ ^^^Id and u^derivatized 
30 catalog * GBH 860458). Derivatized ^ 

control pore glass, phosphoramid te nucleo^^^ ^^^^ 
synthesis reagents were J- ^^^^^ 
™^rrrnsher"::i:Linc Co., Pittsburgh p. 
3. cataTg numhers OS-.O.-ZO and oe-.oe-as., respectively). 
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Ten reaction columns were used for right half 
synthesis of random oligonucleotides ten codons in length. 
The oligonucleotides have 5 monomers at their 3* end of the 
sequence 5'GAGCT3' and 8 monomers at rheir 5' end of the 
sequence 5 • AATTCCAT3 • . The synthesizer was fitted with a 
column derivatized with a thymine nucleotide (T-column, 
MilliGen/Biosearch # 0615.50) and was prograanned to 
synthesize the sequences shown in Table I for each of ten 
columns in independent reaction sets. The sequence of the 
last three monomers (from right to left since synthesis 
fn c;n encode the indicated amino acids; 

Table I 



Column 



Sequence 

fs' to 3n ATTiino Ac?,ds 



column IR (T/G)TTGAGCT Phe and Val 

column 2R (T/C)CTGAGCT Ser and Pro 

column 3R (T/C)ATGAGCT Tyr and His 

columiT 4R (T/C)GTGAGCT Cys and Arg 

column 5R (C/A)TGGAGCT Leu and Met 

column 6R (C/G)AGGAGCT Gin and Glu 

column 7R (A/G)CTGAGCT Thr and Ala 

column 8R (A/G)ATGAGCT Asn and Asp 

column 9R (T/G)GGGAGCT Trp and Gly 

column IR ' A(T/A)AGAGCT He and Cys 

where the two monomers in parentheses denote a single 
monomer position within the codon and indicate that an 
equal mixture of each monomer was added to the reaction for 
coupling. The monomer coupling reactions for each of the 
10 columns were performed as recommended by the 
manufacturer (amidite version S1.06, # 8400-050990, scale 
1 MM) . After the last coupling reaction, the columns were 
washed with acetonitrile and lyophilized to dryness. 

Following synthesis, the plugs were removed from each 
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column using a decrimper and the reaction products were 
poured into a single weigh boat. Initially the bead mass 
increases, due to the weight of the Bonomers, however, at 
■ later rounds of synthesis material is lost. In either 
5 case the material was equalized with underivatized control 
pore' glass and mixed thoroughly to obtain a random 
distribution of all twenty codon species. The reaction 
products were then aliguotted into 10 new reaction colimns 
by removing 25 mg of material at a time and placing xt xnto 

10 separate reaction columns. Alternatively, the reaction 
products, can be aliguotted by suspending the beads in a 
liquid that is dense enough for the beads to renain 
dispersed, preferably a liquid that is equal in density to 
the beads, and then aliquoting eq-aal volumes of the 

15 suspension into separate reaction columns. The lip on the 
inside of the columns where the frits rest was clearea of 
material using vacuum suction with a syringe and 25 G 
needle. New frits were placed onto the lips, the plugs 
were fitted into the columns and were crimped into place 

20 using a crimper. 

synthesis of the second codon position was achieved 
using the above 10 columns containing the random mixture of 
reaction products from the first codon synthesis. The 
monomer coupling reactions for the second codon position 
25 are shown in Table II. An ^ in the first position means 
that any monomer can be programmed into the synthesizer. 
At that position, the first monomer position is not coupled 
by the synthesizer since the software assumes that the 
monomer is already attached to the column. An A also 
30 denotes that the columns from the previous codon synthesis 
should be placed on the synthesizer for use in the present 
synthesis round. Reactions were again sequentially 
repeated for each column as shown in Table II and the 
reaction products washed and dried as described above. 
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10 



Column 




Secjusncs 

( 5 ZO J } 


Amino Acids 


column 


IR 


(T/G) TTA 


Phe 


and Val 


column 


2R 


(T/C j CTA 


Ser 


and Pro 


column 


3R 


(T/C)ATA 


Tyr. 


and His 


column 


4R 


(T/C)GTA 


Cys 


and Arg 


column 


5R 


(C/A)TGA 


Leu 


and Met 


column 


6R 


(C/G)AGA 


Gin 


and Glu 


column 


7R 


(A/G)CTA 


Thr 


and Ala 


column 


8R 


(A/G)AT^ 


Asn 


and Asp 


column 


9R 


(T/G)GGA 


Trp 


and Gly 


column 


lOR 


A(T/A)AA 


He 


and Cys 



Randomization of the second codon position was achieved by 
15 removing the reaction products from each of the columns and 
thoroughly mixing the material. The material was again 
divided into new reaction columns and prepared for monomer 
coupling reactions as described above. 

Random synthesis of the next seven codons (positions 
20 3 through 9) proceeded identically to the cycle described 
above for the second codon position and again used the 
monomer sequences of Table II. Each of the newly repacked 
columns containing the random mixture of reaction products 
from synthesis of the previous codon position was used for 
25 the synthesis of the subsequent codon position. After 
synthesis of the codon at position nine and mixing of the 
reaction products, the material was divided and repacked 
into 40 different columns and the monomer sequences shown 
in Table III were coupled to each of the 40 columns in 
30 independent reactions. The oligonucleotides from each of 
the 40 columns were mixed once more and cleaved from the 
control pore glass as recommended by the manufacturer. 
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Table III 



10 



15 



20 



25 



30 



35 



column 
column IR 
column 2R 
column 3R 
column 4R 
column 5R 
column 6R 
column 7R 
column 8R 
column 9R 
column lOR 
column IIR 
column 12R 
column 13R 
column 14R 
column 15R 
column 16R 
column 17R 
column 18R 
column 19R 
column 2 OR 
coliimn 21R 
column 22R 
column 23R 
column 24R 
column 25R 
column 26R 
column 27R 
col\amn 28R 
column 29R 
colvimn 3 OR 
column 31R 
column 32R 
column 33R 



.qe quenc^ f^^ 

AATTCTTTTA 

AATTCTGTTA 

AATTCGTTTA 

AATTCGGTTA 

AATTCTTCTA 

AATTCTCCTA 

AATTCGTCTA 

AATTCGCCT^ 

AATTCTTATA 

AATTCTCAT^ 

AATTCGTATA 

AATTCGCATA 

AATTCTTGTA 

AATTCGTGTA 

AATTCGCGTA 

AATTCTCTGA 

AATTCTATGA 

AATTCGCTG^ 

AATTCGATGA 

AATTCTCAG^ 

AATTCTGAGA 

AATTCGCAGA 

AATTCGGAG^ 

AATTCTACTA 

AATTCTGCT^ 

AATTCGACTA 

AATTCGGCT^ 

AATTCTAAT£l 

AATTCTGAT^ 

AATTCGAAT^ 

AATTCGGATA 

AATTCTTGGA 
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column 34R AATTCTGGG^ 

column 35R AATTCGTGGA 

column 36R AATTCGGGGA 

column 37R AATTCTAIA^ 

column 38R AATTCTAAA^ 

column 39R AATTCGATAA 

column 4 OR AATTCGAAAA 

Left half synthesis of random oligonucleotides 
proceeded similarly to the right half synthesis. This half 
of the oligonucleotide corresponds to the anti-sense 
sequence of the encoded randomized peptides. Thus, the 
complementary sequence of the codons in Tables I through 
III are synthesized. The left half oligonucleotides also 
have 5 monomers at their 3- end of the sequence 5'GAGCT3' 
15 and 8 monomers at their 5- end of the sequence 
5'AATTCCAT3' . The rounds of synthesis, washing, drying, 
mixing, and dividing are as described above. 

For the first codon position, the synthesizer was 
fitted with a T-column and programmed to synthesize the 
20 sequences shown in Table IV for each of ten columns in 
independent reaction sets. As with right half synthesis, 
the sequence of the last three monomers (from right to 
left) encode the indicated amino acids: 
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Table IV 



5 



10 



Column 




Sequence 
r5' to 3M 


Amino Acids 


column 


IL 


AA(A/C)GAGCT 


Phe 


and Val 


column 


2L 


AG(A/G)GAGCT 


Ser 


and Pro 


column 


3L 


AT(A/G)GAGCT 


Tyr 


and His 


column 


4L 


AC(A/G)GAGCT 


Cys 


and Arg 


column 


5L 


CA(G/T)GAGCT 


Leu 


cind Met 


column 


6L 


CT(G/C)GAGCT 


Gin 


and Gin 


column 


7L 


AG(T/C)GAGCT 


Thr 


and Ala 


column 


8L 


AT(T/C)GAGCT 


Asn 


and Asp 


column 


9L 


CC(A/C)GAGCT 


Trp 


and Gly 


column 


lOL 


T(A/T)TGAGCT 


He 


and Cys 



Following washing and drying, the plugs for each column 
15 were removed, mixed and aliquotted into ten new reaction 
columns as described above. Synthesis of the second codon 
position was achieved using these ten columns containing 
the random mixture of reaction products from the first 
codon synthesis. The monomer coupling reactions for the 
20 second codon position are shown in Table V, 

Table V 



25 



30 



Column 




Sequence 
f5' to 3M 


Amino Acids 


column 


IL 


AA(A/C)A 


Phe 


and Val 


colximn 


2L 


AG(A/G)A 


Ser 


and Pro 


column 


3L 


AT(A/G)A 


Tyr 


and His 


column 


4L 


AC(A/G)A 


Cys 


and Arg 


column 


5L 


CA(G/T) A 


Leu 


and Met 


column 


6L 


CT(G/C)A 


Gin 


and Glu 


column 


7L 


AG(T/C)A 


Thr 


and Ala 


column 


8L 


AT(T/C) A 


Asn 


and Asp 


column 


9L 


CC(A/C) A 


Trp 


and Gly 


column 


lOL 


T(A/T)TA 


He 


and Cys 
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^^™-! nation of the second codon position was 
Again, randomization or icne ^ 

achieved by removing the reaction products from each of the 
coluinns and thoroughly mixing the beads.. The beads were 
repacked into ten new reaction columns. 

Random synthesis of the next seven codon positions 
proceeded identically to the cycle described above for the 
second codon position and again used the monomer sequences 
of Table V. After synthesis of the codon at position nine 
and mixing of the reaction products, the material was 
divided and repacked into 40 different columns and the 
monomer sequences shown in Table VI were coupled to each of 
the 40 columns in independent reactions. 



Table VI 



15 



20 



25 



30 



Column 




sequencp f5' to 3_Li 


column 


XL 


AATTCCATAAAAXX^ 


column 


2L 


AATTC CATAAACXXA 


column 


3L 


AATTCCATAACAXXA 


column 


4L 


AATTCCATAACCXXA 


column 


5L 


AATTCCATAGAAXXA 


column 


6L 


AATTCCATAGACXX^ 


column 


7L 


AATTCCATAGGAXXA 


column 


8L 


AATTCCATAGGCXXA 


column 


9L 


AATTCCATATAAXXA 


column 


lOL 


AATTCCATATACXXA 


column 


IIL 


AATTCCATATGAXXA 


column 


12L 


AATTCCATATGCXXA 


colximn 


13L 


AATTCCATAGAAXXA 


column 


14L 


AATTCCATACACXXA 


column 


15L 


AATTCCATAGGAXXA 


column 


16L 


AATTCCATAGGCXXA 


colvimn 


17L 


AATTCCATCAGAXX£l 


colximn 


18L 


AATTCCATCAGCXXA 


column 


19L 


aattccatcataxxa 


column 


20L 


AATTCCATCATCXXA 
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10 



15 



column 


21L 


AATTCCATCTGAXXA 


column 


22L 


AATTCCATCTGCXXA 


column 


23L 


AATTCCATCTCAXXA 


column 


24L 


AATTCCATCTCCXXA 


column 


25L 


AATTCCATAGTAXXA 


column 


26L 


AATTCCATAGTCXXA 


column 


27L 


AATTCCATAGCAXX^ 


column 


28L 


AATTCCATAGCCXXA 


column 


29L 


AATTCCATATTAXX^ 


column 


30L 


AATTCCATATTCXX^ 


column 


31L 


AATTCCATATCAXXA 


column 


32L 


AATTCCATATCCXXA 


colximn 


33L 


AATTCCATCCAAXXA 


column 


34L 


AATTCCATCCACXXit 


column 


35L 


AATTCCATCCCAXXfi, 


column 


36L 


AATTCCATCCCCXXA 


column 


37L 


AATTCCATTATAXX^ 


column 


38L 


AATTCCATTATCXXA 


column 


39L 


AATTCCATTTTAXX^ 


column 


40L 


AATTCCA'i"i"n'CXX^ 



20 

The first two monomers denoted by an "X" represent an equal 
mixture of all four nucleotides at that position. This is 
necessary to retain a relatively unbiased codon sequence at 
the junction between right and left half oligonucleotides. 
25 The above right and left half random oligonucleotides were 
cleaved and purified from the supports and used in 
constructing the surface expression libraries below. 



30 



Vector Construction 



TWO M13-based vectors, M13IX42 (SEQ ID NO: 1) and 
M13IX22 (SEQ ID KO: 2), were constructed for the cloning 
and propagation of right and left half populations of 
random oligonucleotides, respectively. The vectors were 
specially constructed to facilitate the random Doming and 
subsequent expression of right and left half 



wo 92/06176 



PCr/US9l/07l41 



37 



oligonucleotide populations. Each vector within the 
population contains one right and one left half 
oligonucleotide from the population joined together to fom 
a single contiguous oligonucleotide -;ith random codons 
5 which is twenty-two codons in length. The resultant 
population of vectors are used to construct a surface 
expression library. 

M13IX42, or the right-half vector, was constructed to 
harbor the right half populations of randomized 

10 oligonucleotides. M13mpl8 (Pharmacia, Piscataway, NJ) was 
the starting vector. This vector was genetically modified 
to contain, in addition to the encoded wild type M13 gene 
VIII already present in the vector: (1) a pseudo-wild type 
M13 gene VIII sequence with a stop codon (amber) placed 

15 between it and an Eco Rl-Sac I cloning site for randomized 
oligonucleotides; (2) a pair of Fok I sites to be used for 
joining with M13IX22, the left-half vector; (3) a second 
amber stop codon placed on -^.e opposite side of the vector 
than the portion being combined with the left-half vector; 

20 and (4) various other mutations to remove redundant 
restriction sites and the amino terminal portion of Lac Z. 

The pseudo-wild type M13 gene VIII was used for 
surface expression of random peptides. The pseudo-wild 
type gene encodes the identical amino acid sequence as that 
2- of the wild type gene; however, the nucleotide sequence has 
been altered so that only 53% identity exists between this 
gene and the encoded wild type gene VIII. Modification of 
the gene VIII nucleotide sequence used for surface 
expression reduces the possibility of homologous 

30 recombination with the wild type gene VIII contained on the 
same vector. Additionally, the wild type M13 gene VIII was 
retained in the vector system to ensure that at least some 
functional, non-fusion coat protein would be produced. The 
inclusion of wild type gene VIII therefore reduces the 

35 possibility of non-viable phage production from the random 
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peptide fusion genes. 

The pseudo-wild type gene VIII was constructed by 
■ chemically synthesizing a series of oligonucleotides which 
encode both strands of the gene. The oligonucleotides are 
5 presented in Table VII (SEQ ID NOS; 7 through 16). 

TABLE VII 

pseudo-Wild Type Gene VIII O ligonucleotide Series 



Top Strand 
Oligonucleotides 



Secmence fS' to 3' 



10 



15 



VIII 03 



VIII 04 



VIII 05 



VIII 06 



VIII 07 



GATCC TAG GCT GAA GGC GAT 

GAG CCT GCT AAG GCT GC 

A TTC AAT AGT TTA CAG GCA 

AGT GCT ACT GAG TAC A 

TT GGC TAC GCT TGG GCT ATG 

GTA GTA GTT ATA GTT 

GGT GCT ACC ATA GGG ATT AAA 

TTA TTC AAA AAG TT 

T ACG AGC AAG GCT TCT TA 



20 



Bottom Strand 
Oligonucleotides 



25 



VIII 08 



VTII 09 



VIII 10 



VIII 11 



VIII 12 



AGC TTA AGA AGC CTT GCT CGT 
AAA CTT TTT GAA TAA TTT 
AAT CCC TAT GGT AGC ACC AAC 
TAT AAC TAC TAC CAT 

AGC CCA AGC GTA GCC AAT GTA 

CTC AGT AGC ACT TG 

C CTG TAA ACT ATT GAA TGC 

AGC CTT AGC AGG GTC 

ATC GCC TTC AGC CTA G 



30 



Except for the terminal oligonucleotides VIII 03 (SEQ 
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ID no: 7) and VIII 08 (SEQ ID NO: 12), the above 
oligonucleotides (oligonucleotides VIII 04-VIII 07 and 09- 
12 (SEQ ID NOS: 8 through 11 and 13 through 16)) were mixed 
at 200 ng each in 10 ;il final volume and phosphorylated 
5 with T4 polynucleotide Kinase (Pharmacia, Piscataway, NJ) 
with 1 MM ATP at 37'C for 1 hour. The reaction was stopped 
at 65'C for 5 minutes. Terminal oligonucleotides were 
added to the mixture and annealed into double-stranded form 
by heating to 65'C for 5 minutes, followed by cooling to 

10 room temperature over a period of 3 0 minutes. The annealed 
oliaonucleotides were ligated together with 1.0 U of T4 DNA 
ligase (BRL) . The annealed and ligated oligonucleotides 
yield a double-stranded DNA flanked by a Bam HI site at its 
5' end and by a Hind III site at its 3' end. A 

15 translational stop codon (amber) immediately follows the 
Bam HI site. The gene VIII sequence begins with the codon 
GAA (Glu) two codons 3' to the stop codon. The double- 
stranded insert was phosphorylated using T4 DNA Kinase 
(Pharmacia, Piscataway, NJ) and ATP (10 mW Tris-HCl, pH 

20 7.5, 10 mM MgCl^) and cloned in frame with the Eco RI and 
sac I sites within the M13 polylinker. To do so, M13mpl8 
was digested with Bam HI (New England Biolabs, Beverley, 
MA) and Hind III (New England Biolabs) and combined at a 
molar ratio of 1:10 with the double-stranded insert. The 

25 ligations were performed at 16'C overnight in IX ligase 
buffer (50 mM Tris-HCl, pH 7.8, 10 mM MgCl^, 20 mM DTT, 1 mM 
ATP, 50 Mg/»1 BSA) containing 1.0 U of T4 DNA ligase (New 
England Biolabs). The ligation mixture was transformed 
into a host and screened for positive clones using standard 

30 procedures in the art. 

several mutations were generated within the right-half 
vector to yield functional M13IX42. The mutations were 
generated using the method of Kunkel et al., Meth. Enzymol. 
154:367-382 (1987), which is incorporated herein by 
35 reference, for site-directed mutagenesis. The reagents, 
strains and protocols were obtained from a Bio Rad 
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Mutagenesis kit (Bio Rad, Richmond, CA) and mutagenesis 
performed as recommended by the manufacturer. 

A Fok I site used for joining the right and lefr 
halves was generated 8 nucleotides 5' to the unique Eco RI 
5 site using the oligonucleotide 5 ' -CTCGAATTCGTACATCCT 
GGTCATAGC-3' (SEQ ID NO: 17). The second Fok I site 
retained in the vector is naturally encoded at position 
3547; however, the sequence within the overhang was changed 
to encode CTTC. Two Fok I sites were removed from the 
10 vector at positions 239 and 7244 of M13mpl8 as well as the 
Hind III site at the end of the pseudo gene VIII seguer.ce 
using the mutant oligonucleotides 5 • -CATTTTTGCAGATGGCTTAGA 
-3« (SEQ ID NO: 18) and 5 ' -TAGCATTAACGTCCAATA-3 ' (SEQ IR 
NO: 19) , respectively. New Hind III and Mlu I sites were 
15 also introduced at position 3919 and 3951 of M13IX42. Ihe 
oligonucleotides used for this mutagenesis had the 
sequences 5 ' -ATATATTTTAGTAAGCTTCATCTTCT-3 • (SEQ ID NO: 20) 
and 5'-GACAAAGAACGCGTGAAAACTTT-3 ' (SEQ ID NO: 21), 
respectively. The amino terminal portion of Lac Z vas 
20 deleted by oligonucleotide-directed mutagenesis using the 
mutant oligonucleotide 5'- 
GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT-3' (SEQ ID NO: 22). 
This deletion also removed a third M13mpl8 derived Fok I 
site. The distance between the Eco RI and Sac I sites vas 
25 increased to ensure complete double digestion by inserting 
a spacer sequence. The spacer sequence was inserted using 
the oligonucleotide 5'- 
TTCAGCCTAGGATCCGCCGAGCTCTCCTACCTGCGAATTCGTACATCC-3 ' ( SEQ ID 
NO: 23). Finally, an amber stop codon was placed at 
3 0 position 4492 using the mutant oligonucleotide 5'- 
TGGATTATACTTCTA AATAATGGA-3 ' (SEQ ID NO: 24). The anber 
stop codon is used as a biological selection to ensure the 
proper recombination of vector sequences to bring together 
right and left halves of the randomized oligonucleotides. 
3 5 In constructing the above mutations, all changes made in a 
M13 coding region were performed such that the amino acid 
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sequence remained unaltered. It should be noted that 
several mutations within M13mpl8 were found which differed 
from the published sequence. Where known, these sequence 
diirerenees are recorded herein as found and therefore nay 
5 not correspond exactly to the published sequence of 
M13mpl8 . 

The sequence of the resultant vector, M13IX42, is 
shown in Figure 5 (SEQ ID NO: 1). Figure 3A also shows 
M13IX42 where each of the elements necessary for producing 

10 a surface expression library between right and left half 
randomized oligonucleotides is marked. The sequence 
between the two Fok I sites shown by the arrow is the 
portion of M13IX42 which is to be combined with a portion 
of the left -half vector to produce randon oligonucleotides 

15 as fusion proteins of gene VIII. 

M13IX2 2, or the left-half vector, was constructed to 
harbor the left half populations of randomized 
oligonucleotides. This vector was constructed from M13mpl9 , 
(Pharmacia, Piscataway, NJ) and contains: (1) Two Fok I 

20 sites for mixing with M13IX42 to bring together the left 
and right halves of the randomized oligonucleotides; (2) 
sequences necessary for expression such as a promoter and 
signal sequence and translation initiation signals; (3) an 
ECO Rl-Sac I cloning site for the randomized 

25 oligonucleotides; and (4) an amber stop codon for 
biological selection in bringing together right and left 
half oligonucleotides. 

Of the two Fok I sites used for mixing M13IX22 with 
M13IX42, one is naturally encoded in M13mpl8 and M13mpl9 
30 (at position 3547). As with M13 1X42 , the overhang within 
this natiirally occurring Fok I site was changed to CTTC. 
The other Fok I site was introduced after construction of 
the translation initiation signals by site-directed 
mutagenesis using the oligonucleotide 5'- 
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TAACACTCATTCCGGATGGAATTCTGGAGTCTGGGT-3' (SEQ ID NO: 25). 

■ The translation initiation signals were constructed by 
• annealing of overlapping oligonucleotides as descrx^ea 

above to produce a double-stranded insert containing a o 
5 ECO RI site and a 3 • Hind III site. The overlapping 
oligonucleotides are shown in Table VIII (SEQ ID NOS: 26 
through 34) and were ligated as a double-stranded insert 
between the Eco RI and Hind III sites of M13mpl8 as 
described for the pseudo gene VIII insert. The ribosoae 
10 binding site (AGGAGAC) is located in oligonucleotide 015 
(SEQ ID no: 26) and the translation initiation codon (AT<J) 
is the first three nucleotides of oligonucleotide 016 (SEQ 
ID NO: 27) . 

TABLE VIII 

15 mi. T""^icleot ^H^ .<;ori^5 fnr mnqtniction pf 

yranslat.i nn Signal'^ in y,l^WZ. 

ni nqQ pucleotide Pfouence r°>' to 3') 

AATT C GCC AAG GAG ACA GTC AT 
016 AATG AAA TAG CTA TTG CCT ACG GCA 

GCC GCT GGA TTG TT 
ATTA CTC GCT GCC CAA CCA GCC ATG 
GCC GAG CTC GTG AT 
GACC CAG ACT CCA GATATC CAA CAG 
GAA TGA GTG TTA AT 
019 TCT AGA ACG CGT C 

ACGT G ACG CGT TCT AGA AT TAA 
CACTCA TTC CTG T 

TG GAT ATC TGG AGT CTG GGT CAT 
CAC GAG CTC GGC CAT G 
GC TGG TTG GGC AGC GAG TAA TAA 
CAA TCC AGC GGC TGC C 
GT AGG CAA TAG GTA TTT CAT TAT 
GAC TGT CCT TGG CG 



20 

017 
018 

25 

020 

021 

30 022 
023 
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Oligonucleotide 017 (SEQ ID NO: 27) contained a Sac I 
restriction site 67 nucleotides downstream from the ATG 
codon. The naturally occurring Eco RI site was removed and 
a new site introduced 25 nucleotides downstream from the 
5 sad. Oligonucleotides 5- -TGACTGTCTCCTTGGCGTGTGAAATTGTTA- 
3' (SEQ ID NO: 35) and 5 • -TAACACTCATTCCGGATGGAATTCTGGAGTCT 
GGGT-3' (SEQ ID NO: 36) were used to generate each of the 
mutations, respectively. An amber stop codon was also 
introduced at position 3263 of MlSmplS using the 
10 oligonucleotide 5 • -CAATTTTATCCTAAATCTTACCAAC-3 • (SEQ ID NO: 



In addition to the above mutations, a variety of other 
modifications were made to remove certain sequences and 
redundant restriction sites. The LAC Z ribosome binding 

15 site was removed when the original Eco RI site in H13npl8 
was mutated. Also, the Fok I sites at positions 239, 6351 
and 7244 " of M13apl8 were likewise removed with mutant 
oligonucleotides 5 ' -CATTTTTGCAGATGGCTTAGA-3 ' (SEQ ID NO: 
38), 5'-CGAAAGGGGGGTGTGCTGCAA-3 • (SEQ ID NO: 39) and 5'- 

20 TAGCATTAACGTCCAATA-3 ' (SEQ ID NO: 40), respectively. 
Again, mutations within the coding region did not alter the 
amino acid sequence. 

The resultant vector, M13IX22, is 7320 base pairs in 
length, the sequence of which is shown in Figure 6 (SEQ ID 
25 NO: 2). The Sac I and Eco RI cloning sites are at 
positions 6290 and 6314, respectively. Figure 3A also 
shows M13IX22 where each of the elements necessary for 
producing a surface expression library between right and 
left half randomized oligonucleotides is marked. 

30 T.ibrarv r r>nstruction 

Each population of right and left half randomized 
oligonucleotides from columns IR through 4 OR and columns IL 
through 40L are cloned separately into M13IX42 and M13IX22, 
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respectively, to create sublibraries of right and left half 
randomized oligonucleotides. Therefore, a total of eighty 
sublibraries are generated. Separately maintaining each 
■ population of randomized oligonucleotides until the final 
5 screening step is performed to ensure maximum efficiency of 
annealing of right and left half oligonucleotides.. The 
greater efficiency increases the total number of randomized 
oligonucleotides which can be obtained. Alternatively, one 
can combine all forty populations of right half 
10 oligonucleotides (columns 1R-40R) into one population and 
of left half oligonucleotides (columns 1L-40L) into a 
second population to generate just one sublibrary for each. 

For the generation of sublibraries, each of the above 
populations of randomized oligonucleotides are cloned 

15 separately into the appropriate vector. The right half 
oligonucleotides are cloned into M13IX42 to generate 
sublibraries M13IX42.1R through M13IX42.40R. The left half 
oligonucleotides are similarly cloned into H13IX22 to 
generate sublibraries M13IX22.1L through H13IX22.40L. Each 

20 vector contains unique Eco RI and Sac I restriction enzyme 
sites which produce 5- and 3- single-stranded overhangs, 
respectively, when digested. The single strand overhangs 
are used for the annealing and ligation of the 
complementary single-stranded random oligonucleotides. 

25 The randomized oligonucleotide populations are cloned 

between the Eco RI and Sac I sites by sequential digestion 
and ligation steps. Each vector is treated with an excess 
of ECO RI (New England Biolabs) at 37 'C for 2 hours 
followed by addition of 4-24 units of calf intestinal 
3 0 alkaline phosphatase (Boehringer Mannheim, Indianapolis, 
IN) Reactions are stopped by phenol/chloroform extraction 
and ethanol precipitation. The pellets are resuspended in 
an appropriate amount of distilled or deionized water 
(dH^O) . About 10 pmol of vector is mixed with a 5000-fold 
3 5 mol'ar excess of each population of randomized 
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oligonucleotides in 10 nl of IX ligase buffer (50 mM Tris- 
HCl, pH 7.8, 10 mM MgClj, 20 mM DTT, 1 eM ATP, 50 ^ig/Jal BSA) 
containing 1.0 U of T4 DNA ligase (BRL/ Gaithersburg, MD) . 
The ligation is incubrted at 16' C for 16 hours. Keactions 
5 are stopped by heating at 75 'C for 15 minutes and the DNA 
is digested with an excess of Sac I (New England Biplabs) 
for 2 hours. Sac I is inactivated by heating at 75 "C for 
15 minutes and the volume of the reaction mixture is 
adjusted to 300 ^1 vith an appropriate amount of lOX ligase 
10 buffer and dHp. One unit of T4 DNA ligase (BRL) is added 
and the mixture is incubated overnight at 16 °C. The DNA is 
ethanol precipitated and resuspended in TE (10 mM Tris-HCl, 
pH 8.0, 1 mM EDTA) . DNA from each ligation is 
electroporated into XLl Blue^** cells (Stratagene, La Jolla, 
15 CA) , as described below, to generate the sublibraries . 

E. coli XLl Blue'*^ is electroporated as described by 
Smith et al., Focus 12:38-40 (1990) which is incorporated 
herein by reference. The cexls are prepared by inoculating 
a fresh colony of XLls into 5 mis of SOB without magnesium 

20 (20 g bacto-tryptone, 5 g bacto-yeast extract, 0.584 g 
NaCl, 0.186 g KCl, dHjO to 1,000 mis) and grown with 
vigorous aeration overnight at 37-C. SOB without magnesium 
(500 ml) is inoculated at 1:1000 with the overnight culture 
and grown with vigorous aeration at 37 'C until the ODssj is 

25 0.8 (about 2 to 3 h) . The cells are harvested by 
centrifugation at 5,000 rpm (2,600 x g) in a GS3 rotor 
(Sorvall, Newtown, CT) at 4'C for 10 minutes, resuspended 
in 500 ml of ice-cold 10% (v/v) sterile glycerol and 
centrifuged and resuspended a second time in the same 

30 manner. After a third centrifugation, the cells are 
resuspended in 10% sterile glycerol at a final volume of 
about 2 ml, such that the OD550 of the suspension is 200 to 
300. Usually, resuspension is achieved in the 10% glycerol 
that remains in the bottle after pouring off the supernate. 

3 5 Cells are frozen in 40 ^1 aliquots in microcentrifuge tubes 
using a dry ice-ethanol bath and stored frozen at -70 °C. 
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10 



15 



20 



25 



Frozen cells are electroporated by thawing slowly on 
ice before use and mixing with about 10 pg to 500 ng of 
vector per 40 ^1 of cell suspension. A 40 Ml aliquot is 
placed in an 0.1 cm electroporation chamber (Bio-Raa 
Richmond, CA) and pulsed once at O'C using 200 n parallel 
resistor, 25 /iF, 1-88 kV, which gives a pulse length (r) of 
-4 ms. A 10 Ml aliquot of the pulsed cells are diluted 
into 1 ml SOC (98 mis SOB plus 1 ml of 2 M MgCl^ and 1 ml o. 
2 M glucose) in a 12- x 75-mm culture tube, and the culture 
is shaken at 37 'C for 1 hour prior to culturing in 
selective media, (see below) . 

Each of the eighty sublibraries are cultured using 
methods known to one skilled in the art. Such methods can 
be found in Sanbrook et al., Molecular Cloning: A 
Laboratory Manuel, Cold Spring Harbor Laboratory, Cold 
spring Harbor, 1989, and in Ausubel et al.. Current 
Protocols in Molecular Biology, John Wiley and Sons, New 
York 1989, both of which are incorporated herein by 
reference. Briefly, the above 1 ml sublibrary cultures 
were grown up by diluting 50-fold into 2XYT media (16 g 
tryptone, 10 g yeast extract, 5 g NaCl) and culturing at 
37 -C for 5-8 hours. The bacteria were pelleted by 
centrifugation at 10,000 xg. The supernatant containing 
phage was transferred to a sterile tube and stored at 4-C. 

Double strand vector DNA containing right and left 
half randomized oligonucleotide inserts is isolated from 
the cell pellet of each sublibrary. Briefly, the pellet is 
washed in TE (10 mM Tris, pH 8.0, 1 mM EDTA) and 
recollected by centrifugation at 7,000 rpm for 5' m a 
Sorval centrifuge (Newtown, CT) . Pellets are resuspended 
in 6 mis of 10% sucrose, 50 mM Tris, pH B.O. 3.0 ml of 10 
mg/Ml lysozyne is added and incubated on ice for 20 
Jnutes. 12 mis of 0.2 M NaOH, 1% SDS is added followed by 
10 minutes on ice. The suspensions are then incubated on 
5 ice for 20 minutes after addition of 7.5 mis of 3 M NaOAc, 
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pH 4.5. The samples are centrifuged at 15,000 rpm for 15 
minutes at 4-C, RNased and extracted with 
phenol/ chloroform, followed by ethanol precipitation. The 
pellets are resuspended, weighed and an equal weight of 
5 CsClg is dissolved into each tube until a density of 1.60 
g/ml is achieved. EtBr is added to 60 0 /ig/ml and the 
double-stranded DNA is isolated by equilibriua 
centrifugation in a TV-1665 rotor (Sorval) at 50,000 rpa 
for 6 hours. These DNAs from each right and left half 
10 sublibrary are used to generate forty libraries in which 
the right and left halves of the randomized 
oligonucleotides have been randomly joined together. 

Each of the forty libraries are produced by joining 
together one right half and one left half sublibrary. The 

15 two sublibraries joined together corresponded to the same 
column number for right and left half random 
oligonucleotide synthesis. For example, sublibrary 

M13IX42.1R is joined with '^131X22. IL to produce the surface 
expression library M13IX.1RL. In the alternative situation 

20 where only two sublibraries are generated from the combined 
populations of all right half synthesis and all left half 
synthesis, only one surface expression library would be 
produced . 

For the random joining of each right and left half 
25 oligonucleotide populations into a single surface 
expression vector species, the DNAs isolated from each 
sublibrary are digested an excess of Fok I (New England 
Biolabs) . The reactions are stopped by phenol/chloroform 
extraction, followed by ethanol precipitation. Pellets are 
30 resuspended in dEp. Each surface expression library is 
generated by ligating equal molar amounts (5-10 pmol) of 
Fok I digested DNA isolated from corresponding right and 
left half sublibraries in 10 nl of IX ligase buffer 
containing 1.0 U of T4 DNA ligase (Bethesda Research 
35 Laboratories, Gaithersburg, MD) . The ligations proceed 
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overnight at 16 'C and are electroporated into the sup 
strain MK30-3 (Boehringer Mannheim Biochemical, (BMB) , 
Indianapolis, IN) as previously described for XLl cells. 
Because MK30-3 is sup 0, only the vector portions encoding 
the randomized oligonucleotides which come together will 
produce viable phage. 

c:^ y^pnina nf .giirface Kypression Libraries 

purified phage are prepared from 50 ml liquid cultures 
of XLl Blue^" cells (Stratagene) which are infected at a 
m.o.i. of 10 from the phage stocks stored at 4*0. The 
cultures are induced with 2 mM IPTG. Supernatants from all 
cultures are combined and cleared by two centrifugations, 
and the phage are precipitated by adding 1/7.5 volumes of 
PEG solution (25% PEG-8000, 2.5 M NaCl) , followed by 
incubation at 4-C overnight. The precipitate is recovered 
by centrifugation for 90 minutes at 10,000 x g. Phage 
pellets are resuspended in 25 ml of 0.01 M Tris-HCl , pH 
7 6, 1.0 mM EDTA, and 0.1% Sarkosyl and then shaken slowly 
at room temperature for 30 minutes. The solutions are 
adjusted to 0.5 M NaCl and to a final concentration of 5% 
polyethylene glycol. After 2 hours at 4-C, the 
precipitates containing the phage are recovered x^y 
centrifugation for 1 hour at 15,000 X g. The precipitates 
are resuspended in 10 ml of NET buffer (0.1 M NaCl, 1.0 mM 
EDTA, and 0.01 M Tris-HCl, pH 7.6), mixed well, and the 
phage repelleted by centrifugation at 170,000 X g for 3 
hours. The phage pellets are subsequently resuspended 
overnight in 2 ml of NET buffer and subjected to cesium 
Chloride centrifugation for 18 hours at 110,000 X g (3.86 
g of cesium chloride in 10 ml of buffer) . Phage bands are 
collected, diluted 7-fold with NET buf f er , ■ recentrifuged at 
170 000 X g for 3 hours, resuspended, and stored at 4'C m 
0 3'ml of NET buffer containing 0.1 mM sodium azide. 



Ligand binding proteins used for panning 



on 
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streptavidin coated dishes are first biotinylated and then 
absorbed against UV-inactivated blocking phage (see below) . 
The biotinylating reagents are dissolved in 
dimethyl fonuamide at a ratio of 2.4 rag solid NHS-SS-Biotin 
5 (sulfosuccinimidyl 2-(biotinaraido)ethyl-l,3'- 
dithiopropionate; Pierce; Rockford; IL) to 1 ral solvent and 
used as recommended by the manufacturer. Small-scale 
reactions are accomplished by mixing 1 /il dissolved reagent 
with 43 Ml of 1 mg/ml ligand binding protein diluted in 
10 sterile bicarbonate buffer (0.1 M NaHCOj, pH 8.6). After 2 
hours at 25 'C, residual biotinylating reagent is reacted 
with 500 ^1 1 H ethanolamine (pH adjusted to 9 with HCl) 
for an additional 2 hours. The entire sample is diluted 
with 1 ml TBS containing 1 mg/ml BSA, concentrated to about 
15 50 ^1 on a Centricon 30 ultra-filter (Amicon) , and washed 
on the same filter three times with 2 ml TBS and once with 
1 ml TBS containing 0.02% NaN3 and 7 x 10^^ UV-inactivated 
blocking phage (see below); the final retentate (60-80 m1) 
is stored at 4'C. Ligand binding proteins biotinylated 
20 with the NHS-SS-Biotin reagent are linked to biotin via a 
disulf ide-containing chain. 

UV-irradiated M13 phage were used for blocking binding 
proteins which fortuitously bound filamentous phage in 
general. M13mp8 (Messing and Vieira, Gene 19: 262-276 

25 (1982), which is incorporated herein by reference) was 
chosen because it carries two amber stop codons, which 
ensure that the few phage surviving irradiation will not 
grow in the sup O strains used to titer the surface 
expression libraries. A 5 ml sample containing 5 x 10 

3 0 M13mp8 phage, purified as described above, was placed in a 
small petri plate and irradiated with a germicidal lamp at 
a distance of two feet for 7 minutes (flux 150 ^W/cm ) . 
NaNj was added to 0.02% and phage particles concentrated to 
10^^ particles/ml on a Centricon 30-kDa ultrafilter 

35 (Amicon) . 
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For panning, polystyrene petri plates (60 x 15 nu^, 
Falcon; Becton Dickinson, Lincoln Park, NJ) are incubated 
with 1 ml of 1 mg/nl of streptavidin (BMB) in 0.1 M NaHCO, 
pK 8.6-0.02% Kal.'3 in a small, air-tight plastic box 
overnight in a cold room. The next day streptavidin is 
removed and replaced with at least 10 ml blocking solution 
(29 mg/ml of BSA; 3 ng/ml of streptavidin; 0.1 M NaHCOj pH 
8.6-0.02% NaNj) and incubated at least 1 hour at room 
temperature. The blocking solution is removed and plates 
are washed rapidly three times with Tris buffered saline 
containing 0.5% Tween 20 (TBS-0.5% Tween 20). 

Selection of phage expressing peptides bound by the 
ligand binding proteins is performed with 5 ul (2.7 ^ig 
ligand binding protein) of blocked biotinylated ligai^d 
binding proteins reacted with a 50 pi portion of each 
library. Each mixture is incobated overnight at i'C, 
diluted with 1 ml T3S-0,5% Tween 20. and transferred to a 
streptavidin-coated petri plate prepared as described 
above. After rocking 10 minutes at room temperature, 
unbound phage are removed and plates washed ten times with 
TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound 
phage are eluted from plates with 800 ^1 sterile elution 
buffer (1 mg/ml BSA, 0.1 M HCl, pH adjusted to 2.2 with 
glycerol) for 15 minutes and eluates neutralized with 48 ul 
2 M Tris (pH unadjusted) . A 20 Ml portion of each eluate 
is titered on MK3 0-3 concentrated cells with dilutions of 
input phage, 

A second round of panning is performed by treating 750 
Ml of first eluate from each library with 5 mM DTT for 10 
minutes to break disulfide bonds linking biotin groups to 
residual biotinylated binding proteins. The treated eluate 
is concentrated on a Centricon 30 ultrafilter (Amicon) , 
washed three times with TBS-0.5% Tween 20, and concentrated 
to a final volume of about 50 Ml- Final retentate is 
transferred to a tube containing 5,0 Ml (2-7 Mg ligand 
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binding protein) blocked biotinylated ligand binding 
proteins and incubated overnight. The solution is diluted 
with 1 ml TBS-0.5% Tween 20, panned, and eluted as 
described above on fresh streptavidin-coated petri plates. 
5 The entire second eluate (800 ^1) is' neutralized with 48 ^il 

2 M Tris, and 20 fil is titered simultaneously with the 
first eluate and dilutions of the input phage. 

Individual phage populations are purified through 2 to 

3 rounds of plaque purification. Briefly, the second 
10 eluate titer plates are lifted with nitrocellulose filters 

(Schleicher & Schuell, Inc., Keene, NH) and processed by 
washing for 15 minutes in TBS (10 mM Tris-HCl, pH 7.2, 150 
mM NaCl), followed by an inctibation with shaking for an 
additional 1 hour at 37 'C with TBS containing 5% nonfat dr>' 

15 milk (TBS-5% NDM) at 0,5 ml/cm . The wash is discarded and 
fresh TBS-5% NDM is added (0.1 wl/cn) containing the ligand 
binding protein between 1 nM to 100 mM, preferably between 
1 to 100 /iM. All incubations are carried out in heat- 
sealable pouches (Sears) . Incubation with the ligand 

20 binding protein proceeds for 12-16 hours at 4'C with 
shaking. The filters are removed from the bags and washed 
3 times for 30 minutes at room temperature with 150 mis of 
TBS containing 0.1% NDM and 0.2% NP-40 (Sigma, St. Louis, 
MO) . The filters are then incubated for 2 hours at room 

25 temperature in antiservm against the ligand binding protein 
at an appropriate dilution in TBS-0.5% NDM, washed in 3 
changes of TBS containing 0.1% NDM and 0.2% NP-40 as 
described above and incubated in TBS containing 0.1% NDM 
and 0.2% NP-4 0 with 1 x 10^ cpm of ^^I-labeled Protein A 

30 (specific activity = 2.1 x 10^ cpm/Mg) • After a washing 
with TBS containing 0.1% NDM and 0.2% NP-4 0 as described 
above, the filters are wrapped in Saran Wrap and exposed to 
Kodak X-Omat x-ray film (Kodak, Rochester, NY) for 1-12 
hours at -70 •€ using Dupont Cronex Lightning Plus 

35 Intensifying Screens (Dupont, Willmington, DE) . 
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Positive plaques identified are cored with the large 
end -of a pasteur pipet and placed into 1 ml of SM (5.8 g 
NaCl, 2 g MgSO.-TH^O, 50 ml 1 M Tris-HCl, pH 7.5, 5 mis 2% 
gelatin, to 1000 mis with dH^O) plus 1-3 drops of CHCI3 ana 
5 incubated at 37- C 2-3 hours or overnight at 4 -C. The phage 
are diluted 1:500 in SM and 2 Ml are added to 300 /xl of XLl 
cells plus 3 mis. of soft agar per 100 mm' plate. The XLl 
cells are prepared for plating by growing a colony 
overnight in 10 ml LB (10 g bacto-tryptone, 5 g bacto-yeast 

10 extract, 10 g NaCl, 1000 ml dH20) containing 100 m1 of 20% 
maltose and 100 m1 of 1 M MgSO,. The bacteria are pelletted 
by centrifugation at 2 000 xg for 10 minutes and the pellet 
is resuspended gently in 10 mis of 10 mM MgSO,. The 
suspension is diluted 4-fold by adding 30 mis of 10 mM MgSO, 

15 to give an OD^ of approximately 0.5. The second and third 
round screens are identical to that described above except 
that the plaques are cored with the small end of a pasteur 
pipet and placed into 0 . 5 mis SM plus a drop of CHCI3 and 1- 
5 Ml of the phage following incubation are used for plating 

20 without dilution. At the end of the third round of 
purification, an individual plaque is picked and the 
templates prepared for sequencing. 

Tem plate Preparatio n and Seguencinq 

Templates are prepared for sequencing by inoculating 
25 a 1 ml culture of 2XYT containing a 1:100 dilution of an 
overnight culture of XLl with an individual plaque. The 
plaques are picked using a sterile toothpick. The culture 
is incubated at 37 'C for 5-6 hours with shaking and then 
transferred to a 1.5 ml microfuge tube. 200 ^1 of PEG 
3 0 solution is added, followed by vortexing and placed on ice 
for 10 minutes. The phage precipitate is recovered by 
centrifugation in a microfuge at 12,000 x g f or 5 minutes. 
The supernatant is discarded and the pellet is resuspended 
in 230 Ml of TE (10 mM Tris-HCl, pH 7.5, 1 mM EDTA) by 
35 gently pipeting with a yellow pipet tip. Phenol (200 mD 
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is added, followed by a brief vortex and microfuged to 
separate the phases. The aqueous phase is transferred to 
a separate tube and extracted with 2 00 Ml of 
phenol/chloroform U-D as described above for the phenol 
extraction. A 0.1 volUBe of 3 M NaOAc is added, followed 
by addition of 2.5 volumes of ethanol and precipated at 
-20-C for 20 minutes. The precipated templates are 
recovered by centrifugation in a microfuge at 12,000 x g 
for 8 minutes. The pellet is washed in 70% ethanol, dried 
and resuspended in 25 nl TE. Sequencing was performed 
using a Sequenase'" sequencing Kit foiiuwxi.y f-^- 
supplied by the manufacturer (U.S. Biochemical, Cleveland, 

OH) . 



TTY AMPLE II 



15 Tcm.tion and r ^. r.rterizati nn of Pe ptide T .i qand^ Generated 
v^n^ QliQonuc j pntides Having Random Codons at Two 
PT-Pdstermined Positions 

This example shows the generation of a surface 
expression library from a population of oligonucleotides 

20 having randomized codons. The oligonucleotides are ten 
codons in length and are cloned into a single vector 
species for the generation of a M13 gene Vlll-based surface 
expression library. The example also shows the selection 
of peptides for a ligand binding protein ana 

25 characterization of their encoded nucleic acid sequences. 

r>i Tqnnucleotj f^p Synthesis 

Oligonucleotides were synthesized as described in 
Example I. The synthesizer was programmed to synthesize 
the sequences shown in Table IX. These sequences 
3 0 correspond to the first random codon position synthesized 
and 3- flanking sequences of the oligonucleotide which 
hybridizes to the leader sequence in the vector. The 
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complementary sequences 
mutagenesis of the 
oligonucleotides . 



are used 
synthesized 



for insertional 
population of 



Table IX 



10 



15 



20 



Column Secmence (5' to 3') 

column 1 AA(A/C)GGTTGGTCGGTACCGG 

column 2 AG(A/G)GGTTGGTCGGTACCGG 

column 3 AT (A/G) GGTTGGTCGGTACCGG 

column 4 AC (A/G) GGTTGGTCGGTACCGG 

column 5 CA(G/T) GGTTGGTCGGTACCGG 

column 6 CT(G/C) GGTTGGTCGGTACCGG 

column 7 AG (T/C) GGTTGGTCGGTACCGG 

column 8 AT (T/C) GGTTGGTCGGTACCGG 

column 9 CC (A/ C) GGTTGGTCGGTACCGG 

column IG T(A/T}TGGTTC-GTCGGTACCGG 

The next eight random codon positions were synthesized 
as described for Table V in Example I. Following the ninth 
position synthesis, the reaction products were once more 
combined, mixed and redistributed into 10 new reaction 
columns. Synthesis of the last random codon position and 
5' flanking sequences are shown in Table X. 

Table X 



25 



30 



Column 




qorpiir^nrP f5' tO 3 


colximn 


1 


AGGATCCGCCGAGCTCAA{A/C) h 


column 


2 


AGGATCCGCCGAGCTCAG (A/G) h 


column 


3 


AGGATCCGCCGAGCTCAT (A/G) A 


column 


4 


AGGATCCGCCGAGCTCAG (A/G) A 


column 


5 


AGGATCCGCCGAGCTCCA ( G/T ) A 


colximn 


6 


AGGATCCGCCGAGCTCCT(G/C) A 


colxamn 


7 


AGGATCCGCCGAGCTCAG (T/C) h 


column 


8 


AGGATCCGCCGAGCTCAT (T/C) A 


column 


9 


AGGATCCGCCGAGCTCCC ( A/C) h 


column 


10 


AGGATCCGCCGAGCTCT (A/T) TA 
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The reaction products were mixed once more and the 
oligonucleotides cleaved and purified as recommended by the 
manufacturer. The purified population of oligonucleotides 
were used to generate a surface expression library as 
5 described below. 

Vector Construction 

The vector used for generating surface expression 
libraries from a single oligonucleotide population (i-e-' 
without joining together of right and left half 
10 oligonucleotides) is described below. The vector is a M13- 
based expression vector which directs the synthesis of gene 
Vlll-peptide fusion proteins (Figure 4). This vector 
exhibits all the functions that the conbined right and left 
half vectors of Example I exhibit. 

15 An M13-based vector was constructed for the cloning 

and surface expression of populations of randon 
oligonucleotides (Figure 4, M13IX30) , Ml3mpl9 (Pharmacia) 
was the starting vector. This vector was modified to 
contain, in addition to the encoded wild type M13 gene 

20 VIII : (1) a pseudo-wild type gene, gene VIII sequence with 
an amber stop codon placed between it and the restriction 
sites for cloning oligonucleotides; (2) Stu I, Spe I and 
Xho I restriction sites in frame with the pseudo-wild type 
gVIII for cloning oligonucleotides; (3) sequences necessary 

25 for expression, such as a promoter, signal sequence and 
translation initiation signals; (4) various other mutations 
to remove redundant restriction sites and the amino 
terminal portion of Lac Z. 

Construction of M13IX30 was performed in four steps. 
3 0 In the first step, a precursor vector containing the pseudo 
gene VIII and various other mutations was constructed, 
M13IX01F. The second step involved the construction of a 
small cloning site in a separate M13mpl8 vector to yield 
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M13IX03. in the third step, expression sequences and 
Cloning sites were constructed in Mi3IX03 to generate the 
intermediate vector M13IX04B. The fourth step involved the 
incorporation of the nevly constructed se^ences fro. the 
5 intermediate vector into H13IXoaF to yield H13IX30. 
incorporation of these sequences linked the. with the 
pseudo gene VIII. 

construction of the precursor vector M13IX01F was 
similar to that of M13IX42 described in Example I except 

10 for the following features: (1) M13inpl9 was used as the 
starting vector; (2) the Fok I site 5- to the unique Eco 
RI site was not incorporated and the overhang at the 
naturally occurring Fok I site at position 3547 was not 
changed to 5--CTTC-3'; (3) the spacer sequence was not 

15 incorporated between the Eco RI and Sac I sites; and (4) 
the amber codon at position 44S2 was not incorporated. 

in the second step, M13inpl8 was Butated to remove the 
5- end of Lac Z up to the Lac i binding site and including 
the Lac Z ribosome binding site and start codon. 
20 Additionally, the polylinker was removed and a Mlu I site 
was introduced in the coding region of Lac Z. A single 
oligonucleotide was used for these mutagenesis and had the 
sequence "5 - -AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC- 
3-" (SEQ ID no: 41) . Restriction enzyme sites for Hind III 
25 and Eco RI were introduced downstream of the Mlul site 
using the oligonucleotide "5'- 
GGCGAAAGGGAATTCTGCAAGGCGATTAAGCrTGGGTAACGCC-3 • " (SEQ ID NO: 
42). These modifications of M13mpl8 yielded the vector 
M13IX03. 

30 The expression sequences and cloning sites were 

introduced into M13IX03 by chemically synthesizing a series 
of oligonucleotides which encode both strands of the 
desired sequence. The oligonucleotides are presented m 
Table XI (SEQ ID NOS: 43 through 50). 
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TABLE XI 

Mi:^IX30 Oligonuc leotide Series 



10 



Top Strand 
Oligonucleotides 

084 
027 



028 



029 



Seguence f 5 ' to 3 ' ) 

GGCGTTACCCAAGCTTTGTACATGGAGAAAATAAAG 

TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT 
TACCGT 

TACTGTTTACCCCTGTGACAAAAGCCGCCCAGGTCC 
AGCTGC 

TCGAGTCAGGCCTATTGTGCCCAGGGATTGTACTAG 
TGGATCCG 



15 



Bottom 

Oligonucleotides 
085 
031 

032 

033 



20 



Sequence (5* to 2') 

TGGCGAAAGGGAATTCGGATCCACTAGTACAATCCCTG 

GGCACAATAGGCCTGACTCGAGCAGCTGGACCAGGGCG 
GCTT 

TTGTCACAGGGGTAAACAGTAACGGTAACGGTAAGTGT 
GCCA 

GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT 
ACAA 



25 



30 



The above oligonucleotides except for the terminal 
oligonucleotides 084 (SEQ ID NO: 43) and 085 (SEQ ID HO: 
47) of Table XI were mixed, phosphorylated , annealed and 
ligated to form a double stranded insert as described in 
Example I. However, instead of cloning directly into the 
intermediate vector the insert was first amplified by PGR 
using the terminal oligonucleotides 084 (SEQ ID NO: 43) and 
085 (SEQ ID NO: 47) as primers. The terminal 
oligonucleotide 084 (SEQ ID NO: 43) contains a Hind III 
site 10 nucleotides internal to its 5' end. 
Oligonucleotide 085 (SEQ ID NO: 47) has an Eco RI site at 
its 5' end. Following amplification, the products were 
restricted with Hind III and Eco RI and ligated as 
described in Example I into the polylinker of M13mpl8 
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digested with the same two enzymes. The resultant double 
stranded insert contained a ribosome binding site, a 
translation initiation codon followed by a leader sequence 
and three restriction enzyme sites for cloning random 
5 oligonucleotides (Xho I, Stu I, Spe I). The vector was 
named M13IX04. 

During cloning of the double-stranded insert, it was 
found that one of the GCC codons in oligonucleotides 028 
and its complement in 031 was deleted. Since this deletion 

10 did not affect function, the final construct is missing one 
of the two GCC codons. Additionally, oligonucleotide 032 
contained a GTG codon where a GAG codon was needed. 
Mutagenesis was performed using the oligonucleotide 5'- 
TAACGGTAAGAGTGCCAGTGC-3 ' (SEQ ID NO: 51) to convert the 

15 codon to the desired sequence. The resultant intermediate 
vector was named M13IX04B. 

The fourth step in constructing M13IX30 involved 
inserting the expression and cloning sequences from 
M13IX04B upstream of the pseudo-wild type gVIII in 
20 M13IX01F. This was accomplished by digesting M13IX04B with 
Dra III and Ban HI and gel isolating the 7 00 base pair 
insert containing the sequences of interest. M13IX01F was 
likewise digested with Dra III and Bam HI. The insert was 
combined with the double digested vector at a molar ratio 

25 of 3:1 and ligated as described in Example I. It should be 
noted that all modifications in the vectors described 
herein were confirmed by sequence analysis. The sequence 
of the final construct, M13IX30, is shown in Figure 7 (SEQ 
ID NO: 3). Figure 4 also shows M13IX30 where each of the 

30 elements necessary for surface expression of randomized 
oligonucleotides is marked. 
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T.-ihr-;,-r Y mnstruct ^ nn . Scree ning and c haracterization of 
Encoded Q] iaonucl eotides 

construction or an M13IX30 surface expression library 
is accomplished identically to that described in Example I 
5 for sublibrary construction except the oligonucleotides 
described above are inserted into M13IX3 0 by mutagenesis 
instead of by ligation. The library is constructed and 
propagated on MK30-3 (BMB) and phage stocks are prepared 
for infection of XLI cells and screening. The surface 
10 expression library is screened and encoding 
oligonucleotides characterized as described in Example I. 

T^^XAMPLE III 

Tsolation and Charac tPT-ization of Peptide Uqands .. 
Generated from Right a nd Left Half... . 
15 Degenerate oliq nnncleotides 

This example shows the construction and expression 
of a surface expression library of degenerate 
oligonucleotides. The encoded peptides of this example 
derive from the mixing and joining together of two 
20 separate oligonucleotide populations. Also demonstrated 
is the isolation and characterization of peptide ligands 
and their corresponding nucleotide sequence for specific 
binding proteins. 

gyrt hP-^^s of Oligonucl eotide Populations 

25 A population of left half degenerate 

oligonucleotides and a population of right half 
degenerate oligonucleotides was synthesized using 
standard automated procedures as described in Example I. 



The degenerate codon sequences for each population 
30 of oligonucleotides were generated by sequentially 
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synthesizing the triplet NNG/T where N is an equal 
mixture of all four nucleotides. The antisense sequence 
for each population of oligonucleotides was synthesized 
and each population contained 5 ' and 3 • flanking 
5 sequences complementary to the vector sequence. The 
complementary termini was used to incorporate each 
population of oligonucleotides into their respective 
vectors by standard mutagenesis procedures. Such 
procedures have been described previously in Example I 
10 and in the Detailed Description. Synthesis of the 

antisense sequence of each population was necessary since 
the single-stranded form of the vectors are obtained only 
as the sense strand. 

The left half oligonucleotide population was 
15 synthesized having the following sequence: 5'- 

AGCTCCCGGATGCCTCAGAAGATG (A/CITN) 9GGCTTTTGCCACAGGGG-3 ' (SEQ 
ID NO: 52). The right half oligonucleotide population 
was synthesized having the following sequence: 5'- 
CAGCCTCGGATCCGCC{A/CNN),oATG{A/C)GAAT-3' (SEQ ID NO. 53). 
20 These two oligonucleotide populations when incorporated 
into their respective vectors and joined together encode 
a 20 codon oligonucleotide having 19 degenerate positions 
and an internal predetermined codon sequence. 

Vector Construction 

25 Modified forms of the previously described vectors 

were used for the construction of right and left half 
sublibraries. The construction of left half sublibraries 
was performed in an M13-based vector termed M13ED03. 
This vector is a modified form of the previously 
3 0 described M13IX30 vector and contains all the essential 
features of both M13IX30 and M13IX22. H13ED03 contains, 
in addition to a wild type and a pseudo-wild type gene 
VIII, sequences necessary for expression and two Fok I 
sites for joining with a right half oligonucleotide 
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sublibrary. Therefore, this vector coinbines the 
advantages of both previous vectors in that it can be 
used for the generation and expression of surface 
expression libraries f^om a single oligonucleotide 
5 population or it can be joined with a sublibrary to bring 
together right and left half oligonucleotide populations 
into a surface expression library. 

M13ED03 was constmjcted in two steps from M13IX30. 
The first step involved the modification of M13IX30 to 
10 remove a redundant sequence and to incorporate a sequBnce 
encoding the eight amino-terminal residues of human 6- 
endorphin. The leader sequence was also mutated to 
increase secretion of the product. 

During construction of M13IX04 (an intermediate 
15 vector to M13IX30 which is described in Example II) , a 

six nucleotide sequence was duplicated in oligonucleotide 
027 (SEQ ID NO: 44) and its complement 032 (SEQ ID NO: 
49). This sequence, 5 ' -TTACCG-3 ' , was deleted by 
mutagenesis in the construction of M13ED01. The 
20 oligonucleotide used for the mutagenesis was 

GGTAAACAGTAACGGTAAGAGTGCCAG-3 ' (SEQ ID NO: 54). The 
mutation in the leader sequence was generated using the 
Oligonucleotide 5 ' -GGGCTTTTGCCACAGGGGT-3 • (SEQ ID NO: 
55) o This mutagenesis resulted in the A residue at 
25 position 6353 of M13IX3 0 being changed to a G residue. 
The resultant vector was designated M13IX32. 

To generate M13ED01, the nucleotide sequence 
encoding 6-endorphin (8 amino acid residues of B- 
endorphin plus 3 extra amino acid residues) was 
3 0 incorporated after the leader sequence by mutagenesis- 

The oligonucleotide used had the following sequence: 5'- 

AGGGTCATCGCCTTCAGCTCCGGATCCCTCAGAAGTCATAAACCCCCCATAGGC 
TTTTGCCAC-3' (SEQ ID NO: 56)* This mutagenesis also 
removed some of the downstream sequences through the Spe 
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25 



The 



The second step in the construction of H13EDC3 
involved vector ch.naes which put the B-endorphin 
sequence in fra.e with the downstrea. P-"^""'^"^/;"^ ^ 
sequence and incorporated a Pok I site for ,oxn.ng w.th 
TZZr^ry of right half oligonucleotides. Th.s ve= o 
„as designed to Incorporate oligonucleotxde populations 

.utagLsis using seguences complementary to those 
fLnXing or overlapping with the encoded 6-endorphxn 
sequence. The absence of B-endorphin expression after 
mutagenesis can therefore be used to measure the 
mutagenesis frequency. In addition to the above vecto. 
Changes, M13ED03 was also modified to contain an amber 
ooLn a; position 3... for biological selection during 
joining of right and left half sublibraries. 

The mutations were incorporated using standard 
mutagenesis procedures as described in Example C 
frame shift changes and ToK 1 site were generated using 

ID NO- 57) . The amber codon was generated usu,g the 
oligonucleotide 5 • -CAiTITIMCCrrAAATCnACCAAC-3 • (SEQ 
«0- 56) . The full sequence of the resultant vector, 
H13ED03, is provided in figure 8 (SEQ ID NO, 4). 

The construction of right half oligonucleotide 
sublibraries was performed in a modified form of the 
^31X42 vector. The new vector, M13IX421, is identical 
to M3IX42 except that the amber codon between the Eco 
KX-SacX c oning site and the pseudo-gene VIXX se^ence 
!as rLoved. This change ensures that all expression off 
of tle Lac z promoter produces a peptide-gene VXXI fusion 
"pLreln. Kemoval of the amber codon -s perfo^ed by 
Ltagenesis using the following oligonucleotide 5 

aCCTTCAGCCTCGGATCCOCC-3. (SEQ ID KO: 59). The full 



30 
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sequence of M13IX421 is shown in Figure 9 (SEQ ID NO: 5). 

Library Construction > Scre pn-ing and Characterization of 
Encoded Oligonucleotides 

A sublibrary was constructed for each of the 
5 previously described degenerate populations of 
oligonucleotides. The left half population of 
oligonucleotides was incorporated into M13ED03 to 
generate the sublibrary M13ED03.L and the right half 
population of oligonucleotides was incorporated into 

10 M13IX421 to generate the sublibrary M13IX421-R. Each of 
the oligonucleotide populations were incorporated into 
their respective vectors using site-directed mutagenesis 
as described in Example I, Briefly, the nucleotide 
sequences flanking the degenerate codon seqxiences were 

15 complementary to the vector at the site of incorporation. 
The populations of nucleotides were hybridized to single- 
stranded M13ED03 or Mi3IX4 21 vectors and extended with T4 
DNA polymerase to generate a double-stranded circular 
vector. Mutant templates were obtained by uridine 

20 selection in vivo , as described by Kunkel et al., supra . 
Each of the vector populations were electroporated into 
host cells and propagated as described in Example I. 



The random joining of right and left half 
sublibraries into a single surface expression library was 

25 accomplished as described in Example I except that prior 
to digesting each vector population with Fok I they were 
first digested with an enzyme that cuts in the unwanted 
portion of each vector. Briefly, M13ED03-L was digested 
with Bgl II (cuts at 7094) and M13IX421oR was digested 

30 with Hind III (cuts at 3919) . Each of the digested 
populations were further treated with alkaline 
phosphatase to ensure that the ends would not religate 
and then digested with an excess of Fok I. Ligations, 
electroporation and propagation of the resultant library 



wo 92/06176 



PCT/LiS91/0714l 



64 

was performed as described in Example I. 

The surface expression library was screened for 
ligand binding proteins using a modified panning 
5 procedure. Briefly, 1 ml of the library, about 10^^ phage 
particles, was added to 1-5 of the ligand binding 
protein. The ligand binding protein was either an 
antibody or receptor globulin (Rg) molecule, Aruffo et 
al., Cell 61:1303-1313 (1990), which is incorporated 

10 herein by reference. Phage were incubated shaking with 
affinity ligand at room temperature for 1 to 3 hours 
followed by the addition of 200 /il of latex beads 
(Biosite, San Diego, CA) which were coated with goat- 
antimouse IgG. This mixture was incubated shaking for an 

15 additional 1-2 hours at room temperature. Beads were 
pelleted for 2 minutes by centrifugation in a microfuge 
and washed with TBS which can contain 0.1% Tween 20. 
Three additional washes were performed where the last 
wash did not contain any Tween 20. The bound phage were 

20 then eluted with 200 m1 0.1 M Glycine-HCl, pH 2.2 for 15 
minutes and the beads were spun down by centrifugation. 
The supernatant-containing phage (eluate) was removed and 
phage exhibiting binding to the ligand binding protein 
were further enriched by one-to-two more cycles of 

25 panning. Typical yields after the first eluate were 

about 1 X 10^ - 5 X 10^ pfu. The. second and third eluate 
generally yielded about 5 x 10* - 2 x 10^ pfu and 5 x 
10^ - 1 X 10^° pfu, respectively. 

The second or third eluate was plated at a suitable 
3 0 density for plaque identification screening and 

sequencing of positive clones (i.e., plated at confluency 
for rare clones and 200-500 plaques/plate if pure plaques 
were needed) . Briefly, plaques grown for about 6 hours 
at 37-0 and were overlaid with nitrocellulose filters 
3 5 that had been soaked in 2 mM IPTG and then briefly dried. 
The filters remained on the plaques overnight at room 
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temperature, removed and placed in blocking solution for 
1-2 hours. Following blocking, the filters were 
incubated in 1 ng/ml ligand binding protein in blocking 
solution for 1-2 hours -\t room temperature. Goat 
5 antimouse Ig-coupled alkaline phosphatase (Fisher) was 
added at a 1:1000 dilution and the filters were rapidly 
washed with 10 mis of TBS or block solution over a glass 
vacuum filter. Positive plagues were identified after 
alkaline phosphatase development for detection. 

10 screening of the degenerate oligonucleotide library 

with several different ligand binding proteins resulted 
in the identification of peptide sequences which bound to 
each of the ligands. For example, screening with an 
antibody to 6-endorphin resulted in the detection of 

15 about 30-40 different clones which essentially all had 
the core amino acid sequence known to interact with the 
antibody. The sequences flanking the core sequences were 
different showing that they «are independently derived 
and not duplicates of the same clone. Screening with an 

20 antibody known as 57 gave similar results (i.e., a core 
consensus sequence was identified but the flanking 
sequences among the clones were different) . 

EXAMPLE IV 

r.or,oT-^^-in-n of a I ^ft. Half Ra ndom Oligo nncl Roti de Library 

25 This example shows the synthesis and construction of 

a left half random oligonucleotide library. 

A population of random oligonucleotides nine codons 
in length was synthesized as described in Example I 
except that different sequences at their 5' and 3' ends 
3 0 were synthesized so that they could be easily inserted 
into the vector by mutagenesis. Also, the mixing and 
dividing steps for generating random distributions of 
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reaction products was performed by the alternative method 
of dispensing equal volumes of bead suspensions. The 
liquid chosen that was dense enough for the beadi» to 
remain dispersed was 100% acetonitrile. 

Briefly, each column was prepared for the first 
coupling reaction by suspending 22 mg (l/^mole) of 48 
/xmol/g capacity beads (Genta, San Diego, CA) in 0.5 mis 
of 100% acetonitrile. These beads are smaller than those 
described in Example I and are derivatized with a guanine 
nucleotide. They also do not have a controlled pore 
size. The bead suspension was then transferred to an 
empty reaction column. Suspensions were kept relatively 
dispersed by gently pipetting the suspension during 
transfer. Columns were plugged and sonomer coupling 
reactions were performed as shown in Table XI-. 



Column 



Sequence 
rs' to 3') 



20 



25 



column IL aA(A/C)GGCTTTTGCCACAGG 

column 2L AG (A/G) GGCTTTTGCCACAGG 

column 3L AT (A/G) GGCTTTTGCCACAGG 

column 4L AC (A/G) GGCTTTTGCCACAGG 

column 5L CA(G/T) GGCTTTTGCCACAGG 

column 6L CT(G/C) GGCTTTTGCCACAGG 

column 7L AG (T/C) GGCTTTTGCCACAGG 

column 8L AT (T/C) GGCTTTTGCCACAGG 

column 9L CC(A/C) GGCTTTTGCCACAGG 

column lOL t(A/T)TGGCTTTTGCCACAGG 



After coupling of the last monomer, the columns were 
30 unplugged as described previously and their contents were 
poured into a 1.5 ml microfuge tube. The columns were 
rinsed with 100% acetonitrile to recover any remaining 
beads. The volume used for rinsing was determined so 
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that the final volume of total bead suspension was about 
100 Ml for each new reaction column that the beads would 
be aliquoted into. The mixture was vortexed gently to 
produce a uniformly dispersed suspension and then 
divided, with constant pipetting of the mixture, into 
equal volumes. Each mixture of beads was then 
transferred to an empty reaction column. The empty tubes 
were washed with a small volume of 100% acetonitrile and 
also transferred to their respective columns. Random 
codon positions 2 through 9 were then synthesized as 
.,„^^,.-iHori Example I where the mixing and dividing 
steps were performed using a suspension in 100% 
acetonitrile. The coupling reactions for codon positions 
2 through 9 are shown in Table XIII. 



Table XIII 



Sequence 
column f5' to ? ' ), 



20 



25 



30 



column 
column 
column 
column 
column 
column 
column 
column 
column 
column 



IL 
2L 
3L 
4L 
5L 
6L 
7L 
8L 
9L 
lOL 



AA(A/C)A 

AG(A/G)^ 

AT(A/G)^ 

AC(A/G)^ 

CA(G/T)^ 

CT(G/C)A 

AG(T/C)A 

AT(T/C)^ 

CC(A/C)^ 

T(A/T)TA 



After coupling of the last monomer for the ninth 
codon position, the reaction products were mixed and a 
portion was transferred to an empty reaction column, 
columns were plugged and the following monomer coupling 
reactions were performed: 5 ■ -CGGATGCCTCAGAAGCCCCXXA-3 • 
(SEQ ID no: 60). The resulting population of random 
oligonucleotides was purified and incorporated by 
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mutagenesis into the left half vector M13ED04. 

M13ED04 is a modified version of the M13ED03 vector 
described in Example III and therefore contains all the 
features of that vector. The difference between M13ED03 
and M13ED04 is that M13ED04 does not contain the five 
amino acid sequence (Tyr Gly Gly Phe Met) recognized by 
anti-6-endorphin antibody. This sequence was deleted by 
mutagenesis using the oligonucleotide 5'- 
CGGATGCCTCAGAAGGGCTTTTGCCACAGG (SEQ ID NO: 61). The 
entire nucleotide sequence of this vector is shown in 
Figure 10 (SEQ ID NO: 6) . 



Although the invention has been described with 
reference to the presently preferred embodiment, it 
should be understood that various modifications can be 
15 made without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the claims. 
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SEQUENCE LISTING 



PCr/l!S91/07l41 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Hu^e, Villiam D. 

(ii) TITLE OF IWENTION: SURFACE EXPRESSION LIBRARIES OF 
RANDOMIZED PEPTIDES 

(iii) NUMBER OF SEQUENCES: 61 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pretty, Schroeder, Brueggemann & CiarK 

(B) STREET: 444 South Flower Street, Suite 2000 

(C) CITY: Los Angeles 

(D) STATE: California 

(E) COUNTRY: United States 

(F) ZIP: 90071 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: PatentIn Release #1,0, Version #1.20 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Campbell. Cauhryn A 

(B) REGISTRATION NUMBER: 31,815 

(C) REFERENCE/DOCKET NUMBER: P31 9072 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 535-9001 

(B) TELEFAX: (619) 535-8949 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 
ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 
CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTAGTTTA 
GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 
TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 
TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 
TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 
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CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 
TTTGAGGGGG ATTCAATGA^ TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 
AAACATTTTA CTATTACCCC CTCTGGCAAA ACrrCTTTTG CAAAAGCCTC TCGC-^TTTT 
GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGIG TTGCTCTTAC TATGCCTCGT 
AATTCCmT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCA^CTG 
ATGAATCrrr CTACCTGTAA TAATGTTGTT CCGTTAGTTC GimATTAA CGTAGATTTT 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CGAGTrCTTA AAATCGCATA AGGTAATTCA 
CAATGATTAA AGTTGAAATT AAACCATGTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 
CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 
AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GGCAGCCTAT GCGCCTGGTC 
TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGAT-GACC 
GTCTGCGCCT CGTTCCGGCT AAGTAAGATG GAGCAGGTCG CGGATTTCGA CACAATTTAT lUO 
CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TrGGTATA.AT CGCTGGGGGT 1200 
CAAAGATGAG TGTITTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 
GTGGCArTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCGT 
CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCGGATGCTG TCTTTCGCTG CTGAGGGTGA 
CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 
TGCGTGGGCG ATGGTTGTrG TCATTGTCGG CGCAAGTATC GGTATCAAGC TGTTTAAGAA 1500 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGGCTTTT 
TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTGCTTTC 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGGAA AACCCCATAC AGAAAATTGA 1680 
TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 
TGGGTrCCTA nGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 
ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTGAGC CTCriAATAC TTrCATGTTT 
CAGAATAATA GGTTGCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 
CAAGGCACTG ACCCCGTTAA AACTTATTAG CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTITAATGAA 2220 
GATCCATTCG rTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGGGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCGG GTGGTGGCTC TGGTTCCGGT 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG GTATGACCGA AAATGCCGAT 



480 
540 
600 
660 
720 
780 
S40 
900 
960 
1020 
1080 



1260 
1320 
1380 
1440 



1560 
1620 



1920 
1980 
2040 
2100 



2280 
2340 
2400 
2460 
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GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 
GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTA.'^TGGTAA TGGTGCTACT 
GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 
TTAATGAATA ATTTCCGTCA ATAITTACGT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 
TTTGTCTTrA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 
TTCCGTGGTG TCTTTGCGTT TCITTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 
TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 
TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 
TTAAAAAGGG CTTCGGTAAG ATAGGTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 
GGCTTAACTC AATTCTTGTG GGTTATGTGT GTGATATTAG CGCTCAATTA CCCTCTGACT 
TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 
TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 
ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TCa^GAGG 
CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 
CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 
CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATrGGGCG CGGTAATGAT 
TCCTACGATG AAAATAAAAA CGGCTTGC^ GTrCTCGATG AGTGCGGTAC TTGGTTTAAT 
ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 
AAATTAGGAT GGGATATTAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 
GGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 
TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 
GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TIGGCTTTAT 
ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG GTTTTTCTAG TAATTATGAT 
TCCGGTGTTT ATTCTTATTI AACGCGTrAT TTATCACACG GTCGGTATTT CAAACCATTA 
AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC AGGCGTTCTr 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATIAATTAAT 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA ATTTTGTTTT CTTGATGTTT 
GTTTCATCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGCGATTTT 
GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 
ACTGTTACTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT 
GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 
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„o ™c oncccc.. — — 

— r;:: ^ — 

TCTAATACTT CTAAATCCTC AAATGTATTA TCiA ^^^riTGCCA 

^o...rm .cAx^ccrx ccxc^rrcc xxtc.^c. ™- 

^CTCACCACA XAITGATiaA COGTTTCAtA mCACCTIC AGCAAGCICA T=CTITACAT 
ZZo ~C .CA„ AC.C.CCAO 0„AA 
„r..CTCT<; ITITAICTTC TCCIGOIGGT ICCITCGCIA ITTTTAATGG CGAIGTmA 
^ Z....rr AAAGACAA. AGCGA.GAA A— 

.^^A r^^rrCTTCT ATCTCTGTTG GCCAGAATGT CCCTTTTArr 
____.CGC TTTCAGGTCA GAAGGGTTCi Ai.i-i.v-i- 
ZZZo TCAC.OGTCA ATGXGGCAAT GXAAATAAXC CAmCAOAC CArTGACGGX 
ACXGGTCGTO lOA ^^™CCAA TGGCIGCCGC TAATATTGrT 

CAAAATGTAG GTAXTXCCAX GAGCGITXTI CCIGXTGCAA 

CCGAXAXXA GCAGGAAGGC CGAXAGXXXG AGXXCXXCXA CXCACCCA^^ X^^- 
ACXAAXCAAA GAAGXArTOC XACAACCGXX AAXXXGCOTG .AXCGA GA^ XC^A^ 
GGXGGCCXCA CXGArr.«AA AA.ACAGXXCX CA.O..XXCXG GGOXAGC^ C^^^ 

^^.rrrrr rCCXCTGWI CGAACGAGGA AAGCACGTTA 
ATCCCTTTAA TCGGCCTCCT GTTTAGCTCC CGC^CILAii 

XGAAAGCAAC GAXAGXACOC GCGOXOXAGC GGCGCA^AA GCGG.GG. 
.GXGGXGGXX ACGGGGAOCC TGACCCCXAC ACXXGCCAGC GGCGXACCGC GCGCXCCm 

GGGXxrcxxc ccxrccx^c xcgccaccxx cgccgccxxx ccccoxoaac ™xg^ 

GGGCCXCCGX XXAGGGXXGG CAXXXACXCC XXXACGGCAC CXCGAC^CA ^^XX- 
„X GCXXCACGXA GXGGGCCAXC GCGCXCAXAG ACGG^ G. ^GA^ 
„C ACGXXGXXXA AXAGXCGACX GXXG^ ^^^^^ 
TAXCXCGGGG TAXXCXXXXO AXXTAXMCG OAXX^GC = ArnGG 
GAGGAXXTTG GCGTGCTGCG GCAAAGCAGC GXGG.^CCGGX TGCXGCMCT 
CAGGCGCXGA AGGCCAAXGA GCXGXXGCCC GXGXGGCXGG XGAAAAGAAA AACOA^a^ 
GOGCGGAAXA CGGAAAGGGC GXCXGCGGGC GCGXXGGGCG AXXGAXTAAX G^~ 
CGACAGGTIT CGCGAGXGGA AAGGGGGCAG TGAGGGGAAG GCAATXAAXG TGA^ 
G^rG GGAGCGCAGG CXXXAGACTT XAXGCTXGCG GGXGGXATGX XGX..GGAAX 
rG^A XAAGAAXXXC AGAGAGGAAA GAG.AXGAG CAGGAX^C GM.^^ 

:™ g'aGGGAX XAAAXXAXXG AAAAAGXrXA CGAGCAAGGG XXCXXA.. 

XAGGGAAGAG GCGCGCACCG AXGGGGCXXG GGAACAGXXC CGCAGG.^ 
GGGGXXXGCG XGGXXTCCGG CAGCAGAAGG GGXGGGGGAA AGGXGGCXGG 
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AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 6600 

ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 6660 

CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 6720 

AGGAAGGCCA GACGCGAATT ATTTTGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 6780 

TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT AAATATTTGC 6840 

TTATACAATC TTCCTGTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 6900 

CATGCXAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCCAGAG TCTCAGGCAA 6960 

TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTGTCGGGCA TTAATTTATC 7020 

AGCTAGAACG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 7080 

TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 

AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 7200 

TGTTTTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 

TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT ^294 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7320 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
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CAATGATTAA AGTTGAAATT AAA.CCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 
CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGrTGAT TTGGGTAATG 
AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 
TGTACACCGT TCATGTGTCC TCT7TCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 
GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 
CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 
CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 
GTGGCATTAC GTATTTTACC CGTTTA.ATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 
CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 
CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 
TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 
rrTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 
TATTCTCACT CCGCTGAAAC TGTrGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 
TTTACTAACG TCTGGAAAGA CGAG.^-^^-^GT TTAGATCGTT ACGCTA.=.CTA TGAGGGTTGT 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 
TGGGTTCGTA TTGGGCTTGC TATCCCTG.^^ AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAAGCTC CTGAGTAGGG TGATACACGT 
ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCGTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC TTCTCnGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 
CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCAITAACTG TTTATACGGG CACTGTTACT 
CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 
TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGGTT TCCATTCTGG CTTTAATGAA 
GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCTGGTGG TGGTICTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 
GAAAACGGGC TACAGTCTGA CGGTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 
GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCGTIG GTAATGGTAA TGGTGCTACT 
GGTCATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 
TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 
TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 
rrCCGTGGTG TCTTTGCGTr TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 
TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGGC AGTTCTTTTG GGTATTCCGT 
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TATTATTGCG TTTCCTCGGT TTCCrTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ArTCT:CCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGGTATT TTCATTTTTG ACGTTA^CA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TTAGGATAAA ATTGTAGCTG GGTG CAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCC7CGCGT7 3360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 348C 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

AAATTAGGAT GGGATATTAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 

CGTTCTGGAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTAGAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGGTTTAT 3780' 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CnTTTCTAG TAATTATGAT 3S40 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCArXA 3900 

AATTTAGGTC AGAAGATGAA ATTAACTAAA ATATATTTGA AAAAGTTTTC TCGCGITCTT 3950., 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020: 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT GTAAGGGAAA ATTAAITAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AAnCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATCAGGGGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CGTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA rTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT GGTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 
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CCTCACCTCT GTTTIATCTT CTGCTGGTGC TTCGnCGCT ^TTrm^TG GGGATGTTTT 
.CGGGTATCA GTTCGCGGAT T.^GACTA.^ TAGCCATTCA AA.A.ATAnGT CTCTGCCACG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAC^ATG TGCCTTITAT 
TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTA.^TA_AT CCAmCAGA CGATIGAGCG 
TCAAAATGTA GGTATriGCA TGAGCGTTIT TCCTGrTGCA ATGGCTGGCG GTA.ATATTGT 
TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTICT ACTCAGGCAA GTGATGTTAT 
TACTAATCAA AGAAGTATTG GTAGAACGGT TAATTTCCGT GATGGACAGA GTCrTTTACT 
CGGTGGCGTG AGTGATTATA AAAAGACTTG TCAAGArTCT GGCGT..CCGT TGCTGTCTAA 
^TCCCTTTA ATCGGCCTCC TGTTIAGCTC CCGCTCTGAT TCCA..CGAGG AAAGCACGTT 
ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 
GTGTGGTGGT TACGCGCAGG GTGACCGCTA GACTIGCCAG CGGCCTAGCG GGCGCTCCTT 
TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
CGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACCGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TGGTICACGT AGTGGGCCAT CGCCCTGATA GACGGTITTT CGCCCTTTGA 
CGTTGGAGTC CACGTTCTTI .^TAGTGGAC TCTTGnCCA AACTGGAACA ACACTC.AACC 
CTATGTCGGG CTATTCTTTr GArrTATA..G GGAriTIGCC GAmCGG.AA CCACCATGA.. 
ACAGGATTTT CGCCTGCTGG GGC.^^ACCAG CGTGGACCGC nCCTGCAAG TCTCTCAGGG 
CCAGGCGGTG A.AGGGCAATC AGCTGriGCC CGTCTCGCTG GTG.^W.GAA .^AACCACCCT 
GGCGCCCAAT ACGCAAAGCG CCTCTGCCGG CGGGTIGGGG GAnCA.nA.A TGCAGCTGGC 
ACGACAGGTT TGCCGAGTGG A.:.^GCGGGGA GTGAGCGCAA CGCAATTAAT GTGAGTTAGG 
TCACTCArrA GGCACCCCAG GCTTTAGACT TTATGGTTCC GGCTCGTATG TTGTGTGGAA 
TTGTGAGCGG ATAAGAATTT CAGACGGCAA GGAGACAGTC ATAATGAAAT ACCTATTGCG 
TACGGCAGGC GCTGGAriGT TATTACTCGG TGCGCA.AGCA GCCATGGCCG AGCTCGTGAT 
GACCCAGACT CCAGA^ATrGC ATGGGGAATG AGTGTTA..TT CTAGAACGCG TAAGCTrGGC 
ACTGGCCGTG GTrTTACAAC GTGGTGACTG GGAAA.ACCCT GGCGTTACCG AACTTAATCG 
CCTIGGAGCA CAGGCGCCTI TCGCGAGCTG GCGTAAXAGC GAAGAGGCCC GCACCGATCG 
GCCTTCCCAA GAGTTGGGGA GCCTGAATGG CGAATGGGGC TTIGCCTGGT TIGCGGGACG 
AGAAGCGGTG GCGGAAAGGT GGCTGGAGTG GGATCTICCT GAGGCCGATA CGGTCGTCGT 
CCCCTCAAAC TGGCAGATGG ACGGTTACGA TGGGCCCATC TACACCAACG TAACCIATCC 
CATTAGGGTC AATCCGCCGT TTGTTCCCAC GGAGAATCCG ACGGGTIGTr ACTCGCTCAC 
ATTTAATGTT GATGAAAGCT GGCTAGAGGA AGGCCAGACG CGAATTATTI TTGAIGGCGT 
TCCTATTGGT TAAAAAATGA GGTGATTTAA CAAAAATTTA ACGCGAATTT TA.ACAAAATA 
TTAAGGTTTA CAATITAAAT ATTTGCTTAT ACAATGTTGG TGTTmGGG GCTTTrCTGA 
TTATCAACCG GGGTACATAT GATTGACATG CTAGTTTTAC GAmCCGTT CATCGATTCT 
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CTTGTTTGCT 


CCAGACTCTC 


AGGCAATGAC CTGATAGCCT 


TTGTAGATCT CTCAAAAATA 


*7 r\ 0 n 
/UzU 


GCTACCCTCT 


CCGGCATTAA 


TTTATGAGCT AGAACGGTTG 


AATATCATAT TGATGGTGAT 


7080 


TTGACTGTCT 


CCGGCCTTTC 


TCACCCTTTT GAATCTTTAC 


CTACACATTA CTCAGGCATT 


7140 


GCATTTAAAA 


TATATGAGGG 


TTCTAAAAAT TTTTATCCTT 


GCGTTGAAAT AAAGGCTTCT 


7200 


CCCGCAAAAG 


TATTACAGGG 


TCATAATGTT TTTGGTACAA 


CCGATTTAGC TTTATGCTCT 


7260 


GAGGCTTTAT TGCTTAATTT TGCTAATTCT TTGCCTTGCC 


TGTATGATTT ATTGGACGTT 


7320 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7445 base pairs 

(B) TYPE :_ nucleic acid 

(C) STRAnut,DNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT T.\AAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCnCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGGAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GGCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

GAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TrCGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
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GTGGCATTAC GTATTTTACC CGTTT.-.ATGG AAACTTCCTC ATG.^AAAAGT CTTTAGTCCT 
CAAAGCCTCT GTAGCCGTTG CTACCGTCGT TCCGATGCTG TGinCGCTG CTa^GGGTGA 
CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 
TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGC.AACTATC GGTATCAAGC IGTTTAAGAA 
ATTCACCTCG AAAGCAAGCT GAT^J^J^CCGA TAC.A.ATTAA.. GGCTCCTTTT GGAGCCTTTT 
TTTTTGGAGA TTTTCAACGT G.AA.AAAATTA rTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 
TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 
TGGGTTCCTA TTGGGGTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 
ATTCCGGGCT ATACTTATAT CAACCGTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 
CAGAATAATA GGTTCCGAAA TAGGGAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 
CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCIGG CTTTAATGAA 
GATCCATTCG TTTGTGAATA TC^AGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGGGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC IGGTrCCGGT 2400 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 
GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 
GGTGATmG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAAITCACCT 
TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 
TTTGTCTTrA GCGCTGGTAA ACCATATGAA TTTICTATTG ATTGTGACAA AAIAAACTTA 
TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 
TTTGCTAAGA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 
TATTATTGCG TTTCCTCGGT TECCTTCTGG TAACTTTGTT CGGCTATCTG CTrACTTTTC 
TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTGTrGCT CTTATIATTG 
GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 
TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTr TAIGTTATTC 
TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 
ATTGGGATAA ATAATATGGC TGrrrATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 
CTCGrrAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGGAACTAAT 3300 
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CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TGGCTAAAAC GCCTCGCGTT 
CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 
TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 
ACCCGTTCTT GGAATGATAA GCAA.'.GACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 
AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 
CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 
TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 
GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 
ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 
TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 
AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAAG GTAATTCA.^ TGAAA.TTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 
TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 
TGTAACTTGG TATTCAAAGG AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 
TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 
TGTTrTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 
TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 
TGATAATTCC GCTCCTTCTG GTGGTrTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 
TTTTAAAATT AATAACCTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 
GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 
TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 
AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 
CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 
TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 
TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGGG GTAATATTGT 
TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 
TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 
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CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 
AATCCCTTTA ATCGGCCTCG TGTTTAGCTC CCGCTCTGAT TCCMCGAGG AAAGCACGTT 
ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 
GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 
TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
GGGGGCTCCC TTTAGGGTrC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 
CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCG GATTTCGGAA CCACCATCAA 
ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGGAAC TCTCTCAGGG 
CGAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 
GTGACTGGGA AAACCGTGGC GTTACCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 
AAGCACTATT GCACTGGCAG TCTTACCGTT ACCGTIACTG TTTACCCCTG TGACAAAAGC 
CGCCCAGGTC CAGCTGCTCG AGTCAGGCCT ATTGIGCCCA GGGGATTGTA CTAGTGGATC 
CTAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT AGTTTACAGG CAAGTGCTAC 
TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA GTTGGTGCTA CCATAGGGAT 
TAAATrATTC AAAAAGTTTA CGAGCAAGGC TTCTIAAGCA ATAGCGAAGA GGCCCGCACG 
GATCGCCCTT CCCAACAGTT GGGCAGCCTG AATGGGGAAT GGCGCTTTGC CTGGTTTCCG 
GCACCAGAAG CGGTGCCGGA AAGCTGGCTG GAGTGCGATC TTCCTGAGGC CGATACGGTC 
GTGGTCCCCT CAAACTGGCA GATGCACGGT TACGATGCGC CCATCTACAC CAACGTAACC 
TATCCCATTA CGGTCAATCC GCCGTTTGTT CCCACGGAGA ATCCGACGGG TTGTTACTCG 
GTGACATTTA ATGTTGATGA AAGCTGGCTA CAGGAAGGCC AGACGCGAAT TATTTTTGAT 
GGCGTICCTA riGGTTAAAA AATGAGCTGA TTTAAGAA^A ATTTAACGCG AArTTTAACA 
AAATATTAAC GTrTACAATT TAAATATTTG CTTATACAAT GTTCCTGTTr TTGGGGCTTT 
TCTGATTATCAAGCGGGGTA CATATGATTG ACATGGTAGT TTTACGATTA CCGTTCATCG 
ATTCTCTTGT TTGCTCCAGA CTCTGAGGCA ATGACGTGAT AGCCTTTGTA GATCTCTCAA 
AAATAGCTAC CCTCTCGGGC ATTAATTTAT CAGCTAGAAC GGTTGAATAT CATATTGATG 
CTGATTIGAC TGTCTCCGGC CTTTCTCACG CTTTTGAATC TTTACCTACA CATTACTCAG 
GCATTGCATT TAAAATATAT GAGGGTTCTA AAAATTTTrA TCCTTGCGTT GAAATAAAGG 
CTTCTCCGGC AAAAGTATTA CAGGGTCATA ATGTTTTTGG TACAACCGAT TTAGCTTTAT 
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GCTCTGAGGC TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 7440 
ACGTT 7445 

(2) INFORMATION FOR SEO ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7409 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AATGCTACTA 


, CTATTAGTAG 


* AATTGATGCC 


. ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGmG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGA.'i. 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGG 


TTCCTCTTAA 


TcrrriTGAT 


GCAATCCGCT 


TTGCTTGTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTAC CCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGrrrcGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


GGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 
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.TTCACCTCG AAAGCAAGCT GATA-^CCG. TAC^AAA GGCTCGTTrT GGAGCCTTTT 
TTTTIGGAGA rTTTCAACGT GAAA^-.^TTA TTATTCGC..A TTCCTTIAGT TGTTGCTTTC 
TATTCTCACT CCGCTGAAAC TCriG.^AAGT TGTTTAGCA.A AACCCCATAC AGAAAATTCA 
TTTACTAAGG TCTGGAAAGA CGAC.AAAACT TrAGATCGTT ACGCTA.ACTA T.AGGGTTGT 
CTGTGGAATG CTAGAGGCCT TGTAGTTTGT ACTGGTGACG AAACTGAGTG TTACGGTAGA 
TGGGTTCGTA TTGGGCTTGC TATCCCTGA.A A.ATGAGGGTG GXGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTA.AACCTC CTGAGTACGG TGATACACCT 
InCGGGGGT ATAGTTATAT GAACGCTCTC GACGGCAGIT ATCCGCCTGG TACTGAGCAA 
AACCCCGCTA ATGCTAATCG TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 
CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 
CAAGGCACTG ACCCCGTTAA AACTTATTAC GAGTACACTC CTGTATCATC AAAAGCCATG 
TATGACGCTT ACTGGAACGG TAA..TTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 
GATCCATTCG TTrGTCAATA TCAAGGCCA.. TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGGTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTGCGGT 
GATTTTGATT ATG..AAAGAT GGC.A.^XGCT A.ATA.AGGGGG CTATGACCGA AAATGCCGAT 
GAAAAGGCGC TACAGTGTGA GGCT...^^GGC AA..CTTGArr CTGTCGGTAC TGATTACGGT 
GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTA..TGGTAA TGGTGCTAGT 
GGTGATTrTG CrGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA T.-ATTCACCT 
TTAATGAATA ATITCCGTCA ATATTTAGGT TGCCTGCGTC AATCGGTTGA ATGTCGGCGT 
TrrGTCTTTA GCGCTGGTA-. ACCAIATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 
TTCCGTGGTG TGTrrGCGTT TCTTTTATAT GTTGGGAGGT TTATGTATGT ATTTTGTACG 
rrrGCTAACA TACTGCGTA.A TAAGGAGTCT TAATCATGCC AGTrCTTTIG GGTATTCCGT 
TATTATTGCG TTTCCTCGGT TICCnCTGG TAACTnGTT CGGCTATCTG CTTACTTrTC 
TTAAAAAGGG GTIGGGTAAG ATAGCTATTG GTATTrCATT GTrTCTrGCT CTTATTATTG 
GGCTTAAGTG AATrCTIGTG GGTTATCTCT CTGATATIAG CGCTGAATTA CCCTCTGACT 
TTGTICAGGG TGTIGAGrrA ATTCTCGGGT CTAATGGGCT TCGCTGTTTT TATGTTATTC 
TGTCTGTAAA GGCTGCTATT TTCATTTnG ACGTTAAACA AAAAATCGTT TCTTATTTGG 
ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 
CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 
CTrGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 
CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 
TCGTAGGATG AAAATAAAAA GGGCTTGCTT GTTGTCGATG AGTGGGGTAG TTGGTTTAAT 
ACCCGTTCn GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTrTCT ACATGCTCGT 
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AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACGT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3S40 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTaT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGArr CTAAGGGAAA ATTAA.rTAAT il40 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATGAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA MTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGITTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 

CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATArrCT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 
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TCGCTTTCrr CCCnCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTIACGGCA CCTCGACCCC A.AAAAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTrGGAGTC CACGTTCTrT AATAGTGGAC TCrrGTrCCA AACTGGAACA ACACTCAACC 
CTATCTCGGG CTATTCrrTT GAirTATAAG GGATTTrGCC GATTTCGGAA CCACCATCAA 
ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 
CCAGGCGGTG AAGGGCAATC AGCTGrTGCC CGTCTCGCTG GTGAAAAGAA A.A.ACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTIGGCC GAlTCATrAA TGCAGCTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 
GTGACTGGGA AAACCCTGGC GTTACCCAAG CrTTGTACAT GGAGAAAATA AAGTGAAACA 
AAGCACTATT GCACTGGCAC TCrTACCGTT ACTGnTACC CCTGTGGCAA A..GCCTATGG 
GGGGTTTATG ACTTCTGAGG GATCCGGAGC TGA-AGGCGAT GACCCTGCTA AGGCTGCATT 
CAATAGTITA CAGGCAAGTG CTACTGAGTA CATIGGCTAC GCTTGGGCTA TGGTAGTAGT 
TATAGTTGGT GCTACCATAG GGATTAAATT ATTC^AAAG TTIACGAGCA AGGCTTCTTA 
AGCAATAGCG AAGAGGCCCG CACCGATCGC CCnCCC.^C AGTTGCGCAG CCTGAATGGC 
GAATGGCGCT TTGCCTGGTr TCCGGCACCA GAAGCGGTGC CGGAAAGCTG GCTGGAGTGC 
GATCTTCCTG AGGCCGATAC GGTGGTCGTC CCCICAAACT GGCAGATGCA CGGTTACGAT 
GCGCCCATCT ACACCAACGT .AACCTATCCC ArTACGGTCA ATCCGCCGTT TGTTCCCACG 
GAGAAICCGA CGGGnGTIA CTCGCTCACA TTTA-AIGTrG ATGAAAGCTG GCTACAGGA.A 
GGCCAGACGC GAATIATTrT TGATGGCGTT CCTATTGGTT AAAAAATGAG CTGATTTAAC 
AAAAATTTAA CGCGAATTrT AACAAAATAT TAACGTTTAC AATTTAAATA mGCTTATA 
CAATCTTCCT GTTTnGGGG GTTTrCTGAT TATCAACCGG GGTACATATG ATTGACATGC 
TAGTmACG ATIACCGTrC ATGGATTCTC TTGTTIGCTC CAGACTGTCA GGCAATGACC 
TGATAGCCTT TGTAGATCTC TCAAAAATAG CTACCCTCTC CGGCATTAAT TTATCAGCTA 
GAACGGTIGA ATATCATATT GATGGTGATI TGACTGTCTC CGGCCTTTCT CACCCTmG 

AATCrrrACC tacacattac tcaggcattg catttaaaat atatgagggt tctaaaaati 

TTrATCCTTG CGTTGAAATA AAGGCTTCTC CCGCAAAAGT ATTACAGGGT CATAATGTTT 
TTGGTACAAC CGATrrAGCT riATGCTCTG AGGCTTTATT GCTrAATTn GCTAATTCTr 
TGCCTTGCCT GTATGATTTA TTGGACGTT 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circalar 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAA^.GATGT TGAGGTACAG GACGAGATTC AGGAATTAAC CTCTAAGCCA 2AC 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATGCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT T^'^GATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT. 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTIGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTITAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTGCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
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^.CTAACC TC™. -C^CT .C=CX«C.. T .0 

0X0X03^X0 TOX.CXXX.X .CXCOXCCC .^CTCOXC ~ 

Xc™ TXCCOCXXCC T.XCCCX=^ MmC.CXC OXCCCXCXC^ — 
TCXOACCGXC GCCmCXC. OOCTCOCOOX .CXMACCXC CTG.OXACCC IGmCACOX 
:LgGX .X.CXX.X.X C.GCGXCXC G.CGOC.CXX .XGCGCGXGG — 
«.CGGGGCX. ATCCXAA.CC XXCXCXXG.G G.GXCXC.GC CXGrt^.TAC TTrCAX m 
C.G..TMX. GGXXCCCM. — GGG GC.XX..CXG XTX.X.CGGG G^= 
C..GGG.CXG .GCGGGXT.. MGXX.XX.G GAGX.C.CXC CXGX.XC.XC - 
X.XO.CCCXX .CXGG«.CaO XMAXXC.G. GACXCCGGXX XCC.XXCXGG CTX^M CM 
G.XGG.XXCG XTXCXGMX. XCMOGCGM XGCXCXO.CG XCCCXCMCC XC . M 
.CXGCCCCGG GCXCXCGXCG XCGXXCXGGX CGCGCCXCXG AGGGXGGXGG « 
OGCGCXXCXG AGGGXCGOGC CXCXG.GGG. GGCGOXrCCG GXGGTCGCXC XGGXTCGGG 
OAXXXXGm AXGAAAAOAX GGCAAACGCX AAXAAGGGGG CXAXGAGGGA AAAXGCCGAX 
CAAAACGCGC XAGAGXCXGA CGCXAAAGGC AAACXTGAXX CXGXCGCTAC XaAXXAGCGT 
CGXGCXAXGO AXGGXTXGAX XGGXGACGXX XGGGGCCXTG CXA.AXGOXAA X— 
.OXGAXTTTG GXGOGXGXAA XXGCGAAAXG CGXGAA.GXGG GXGACGCXGA XMXrC 
XXAAXCAAXA AXTXCCGXCA AXAXXTAGCX XCCCXCGGXC .AXCGOXXGA 
xrXGXGTTXA GCGCXGGXAA AGCAXAXGAA ^rXG AXXGXGACAA AAXAMC^A 
XXCCGXGGXG XCXXXOCGTX XCXTTTAXAX GXTGCCACCX XXAXGXAXGX A^ 
XXXGGXAACA XACXGGGXAA XAAGGAGXGX XAAXCAXGGC AGXTCXXXXC ^AXK^ 
XAXXAXXGGO XXXCCXOGOX XICCrXGXGG XAACXTXGXX GGOCXATGXG CXXACTXXXC 

:Laggg cxxcggxaag axaccxaxxg cxa™ GxncxxG. crxA.^ 

GGCTXAA^O AAXXGXXGXG GGXXAXGXCX GXGAXATTAG CGCTGAAXXA CCGX^-- 
XXGXXGAGGG XOXXCAGTTA AXXCXCCCGX CTAAXGCGGX XOCCXGXXTX XAXGXX^C 
^"gxaaa OGOXGCXATT XXCAXXXXXG AGGXXAAACA AAAAAXCGXX XCrXA^TGC 
a::^:: AXAAXAXGGC XGXXXAXXXX GXAACXGGCA AA.AGGC.C X— 
CXCOXXAGGG XXGGXAAGAX XCAGGAXAAA AXXGXAGCXG GOXGCAmX ^^^TAA^ 
GXTGATTIAA GGCXXGAAA.A GCXGCGOCAA GXCGGGACCX XCGCXAAAAC GCGXCGCGTr 
:^:AAXAG CCGAXAAGCG XXCXAXAXGX GAXXXGCrXG GXAXXG^ 
XGCXAGGAXG AAAAXAAAAA GCGGXXGGTr GTTCXGGAXG ACTTGCGG^^^ TT™ 

AGCCGrrcrr gcaaxgataa ggaaagacac ccgatxatto AxrGGTXXcx ac^^- 

MAXXAGGAX GGGAXAXXAX CTTCGXXGXt CAGGAC^AX CTAXIGXTGA XAAACAGGGG 
t^CCAT XAGCXGAAGA TGXXGXXXAX XGXCGXGGXG XGGAGAGAAX XAGTXXACCX 
^Gxrc- CXXXAXAXrC XCXXAXXAGX GCCXGGAA.AA .CCCXGXGGG XAAAXXAGAX 
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GTTGGCGTTG TTAAATATGG CGATTCTGAA TTAAGCCCTA CTGTTGAGCG TTGGGTTTAT 
ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 
TCCGGTGTTT ATTCTTATTT AACGGCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 
AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA ATTTTGTTTT CTTGATGTTT 
GTTTGATGA.T CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGCGATTTT 
GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 
ACTGTTACTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT 
GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 
AATCCAAACA ATCAGGATTA TATTGATGAA TTGCCATCAT CTGATAATCA GGAATATGAT 
GATAATTCCG CTCCTTCTGG TGGTTTCTTT GTTCCGCAA.i ATGATAATGT TACTCAAACT 
TTTAAAATTA ATAACGTTGG GGCAAAGGAT TTAATACGAG TTGTCGAATT GTTTGTAAAG 
TCTAATACTT CTAAATCCTC AAATGTAr^^- TCTATTGACG GCTCTAATCT ATTAGTTGTT 
AGTGCACCTA AAGATATTTT AGATAAGCTT CCTCAATTCC TTTCTACTGT TGATTTGCCA 
ACTGACCAGA TATTGATTGA GGGTTTGATA TTTGAGGTTC AGCAAGGTGA TGCTTTAGAT 
TTTTCATTTG CTGCTGGCTC TGAGCGTGGC ACTGTTGCAG GCGCTGTTAA TACTGACCGC 
CTCACCTCTG TTTTATCTTC TGCTGGTGGT TCGTTCGGTA TTTTTAATGG CGATGTTTTA 
GGGCTATCAG TTCGCGCATT AAAGACTAAT AGCCATTCAA AAATATTGTC TGTGCCACGT 
ATTCrrACGC TTTCAGGTCA GAAGGGTTCT ATCTCTGTTG GCCAGAATGT CCCTTTTATT 
ACTGGTCGTG TGACTGGTGA ATCTGCCAAT GTAAATAATC CATTTCAGAG GATTGAGCGT 
CAAAATGTAG GTATTTCCAT GAGCGTTTTT CCTGTTGCAA TGGCTGGCGG TAATATTGTT 
CTGGATATTA CCAGCAAGGC CGATAGTTTG AGTTCTTCTA CTCAGGCAAG TGATGTTATT 
ACTAATCAAA GAAGTATTGC TACAACGCTT AATTTGCGTG ATGGACAGAC TCTTTTACTC 
GGTGGCCTCA CTGATTATAA AAACACTTCT CAAGATTCTG GCGTACCGTT CCTGTCTAAA 
ATCCCTTTAA TCGGCCTCCT GTTTAGCTGC CGCTCTGATT CCAACGAGGA AAGCACGTTA 
TACGTGCTCG TCAAAGCAAC CATAGTACGC GCCCTGTAGC GGCGCATTAA GCGCGGGGGG 
TGTGGTGGTT ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCCTTT 
CGCTTTCTTC CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG 
GGGGCTCCCT TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA 
TTTGGGTGAT GGTTCACGTA GTGGGCCATC GCCCTGATAG ACGGTTTTTC GCCCTTTGAG 
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GTTGGAGTCC ACGTTCTrTA ATAGTGGACT CTTGTTGCAA ACTGGAACAA CACTCAACCC 
TATCTGGGGC TATrCTTTTG ATTTATAAGG GATTTTGCCG ATTTCGGAAC CACCATCAAA 
CAGGATTTTC GCCTGCTGGG GCAAACCAGC GTGGACCGCT TGCTGCAACT CTCTCAGGGC 
CAGGCGGTGA AGGGCAATCA GCTGTTGCCC GTCTCGCTGG TGAAAAGAAA AACCACCCTG 
GCGCCCAATA CGCAAACCGC CTCTCCCCGC GCGTTGGCCG ATTCATTAAT GCAGCTGGCA 
CGACAGGTTT CCCGACTGGA AAGCGGGCAG TGAGCGCAAC GCAATTAATG TGAGTTAGCT 
CACTCATTAG GCACCCCAGG CTTTACACTT TATGCTTCCG GCTCGTATGT TGTGTGGAAT 
TGTGAGCGGA TAACAATTTC ACACAGGAAA CAGCTATGAC CAGGATGTAC GAATTCGCAG 
GTAGGAGAGC TCGGCGGATC CGAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT 
AGTTTACAGG CAAGTGCTAC TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA 
GTTGGTGCTA CCATAGGGAT TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA 
GCTGGCGTAA TAGCGAAGAG GCCCGCACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 
ATGGCGAATG GCGCTTIGCC TGGTTTCCGG CACCAC^AGC GGTGCCGGAA AGCTGGCTGG 
AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 
ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTrGTTC 
CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA. TGTrGATGA.^ AGCTGGCTAC 
AGGAAGGCCA GACGCGA^.TT ATTTTTGATG GCGTTCCTAT TGGTTAAAA.A ATGAGCTGAT 
TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATT-^CG TTTACAATTT AAATATTTGC 
ttatacaatc rrccTGTTrr TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 
CATGCTAGTT rrACGATTAC CGTTCATCGA TTCTCrrGTT TGCICCAGAC TCTCAGGCAA 
TGACCTGATA GCCmGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGGA TTAATTTATC 
AGCTAGAACG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 
TITTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 
AAATTTrTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 
TGTTmGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 
TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7394 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 
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ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GCTTTTTATC CTCGTCTGCT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT SCO 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG ■^GGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

GAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTAGCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAAGTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTrXGT ACTGGTGACG AAACTCAGTG TTAGGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 
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^TTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 
rAAGGGACTG ACCCCGTTAA Ai^CTTATTAL OAoin 

iccXX .CTC=«.CCC .^.TXCC. O.C.OC.CTX cmM.OM 
LcnCC rr™. TCMCCCC. tCCTCXC.CC TCCCTCMCC TCCTrTOM 

CCXC.COXCC xccrrcxccx cococctcxc .ccc.cc.cc ccccc 

CCCCCTTCXO .CCCXCCCCC CXCXC.CCCA CCCCCXXCCC CXCCXCCCXC XCCXTCC 
OAXXTXCAXX AXCA.^.CAX CCCAMCCCX AAXMCCCCC CXAXCACCCA AAAICCCCAT 
OAAAAOCCCC XAOACXCTCA CCCXAA..CCC AAACTTCAXX CXOXCCCTAC XCAXXACCCX 
™CC AXCCXXXCAX XCCXCACCXX XCCCCCCXXC CXAAXCCXAA XCCXCCXACX 
CCXCATXXXC CXCCCXCXA. XXCCCAAAXC CCTCAACXCC CXCACCCXCA TMn A ^ 
XTAATCAATA AXTXCCCXCA AXAXXXACCX XCCCXCCCXC AAXCCCXXCA AXCXCCCCC^ 
XnCXCXTXA CCCCXCCXAA ACCAXAXCAA XXXXCXAXXC AXXCXCACAA AAXA^^CXTA 
XXCCCXCCXC XCXXXCCCXX XCrnXAXAX CXXCCCACCX TXAXCTAXCX AXrrXCTACC 
^CCXAACA XACXCCCXAA T.-.CCACXCT TAAXCAXCCC AGXXCXXTTC CCXAXXCCCX 
XAXXAXTCCC XXXCCXCCCX XXCCXXCXCC XA..CTrXCXX CCCCXAXCTC CXXACXXXXC 
„^CCC CXXCCCXA.C AXACCTATTC CXArTXCAXX CXtXCXtCCX CTIAXXATXC 
TcCTTAACXC AAXXCrrCXC CCXTAXCXCX CXCAXAXXAC CCCXCAAXXA CCCTCTOACT 
: rACCC XCXXCACXXA .X.CXCCCCX CXAAXCCCCX XCCCXCXXXX Xa™ 
XCXCXCTAAA CCCXCCXAXT XXCAXXXXXC ACCXTAAACA A^AAXCCXX XCXTA^C 
AXXCCCAXAA AXAAXAXCCC XCXXXAXXXT CXAACXCCC AAXXACCCXC XCCAAACACC 
TcCXXACCC XXCCXAACAX XTACCAXAAA AXXCXACCXC CCXCCAAAAX ACCAACXA^ 3300 
CXXCAXTTAA CCCXXCAAAA CCXCCCCCAA CXCCCCACCX XCCCXAAAAC CC™ 
CXXACAAXAC CCCAXAACCC tXCXAXAXCX CAXXXCCXXC CXAXTCCCCC CCCXAATCAX 
AAAAXAAAAA CCCCXXCCXX CXXCXCCAXC A— ™ 
.CCCCXXCXr CCAAXCAXAA CCAAACACAC COCAXXAXXC AXXCCXXXCX AC^OCTC^ 
^XXACCAX CCCAXAXTAX XXTTCXXCXX CAOCACXXAX .TAXXCXXCA ^^CC 
COXXCXCCAX XACCXCAACA TCXTCXXXAX XCXCCXCCXC XCCACACAAX XA^X 
XXTCXCCCXA crrXAXAXXC XCXXAXXACX CCCXCCAAAA XCCCXCXCCC x^x^x 

orrocccxTc ixaaaxaxoc ccaxxcxcaa xxaaccccxa cxcxxcaccc xiocctxxax 

"CA AXXXCXAXAA CCCAXAXCAX ACXAAACACC ™X- — 

xccccxonx AXXcriAxrx aaccccxxax xxaxcacacc gxccoxaxtx caaaccaxxa 
ir^c acaacaxcaa ccxxacxaaa axaxaxxxca aaaacxxxxc accccxx^ 
TxXCCCA XXCCAXXXCC AXCACCAXrr acaxaxacxx —a ac^^^ 
..OCXXAAAA accxacxcxc xcacaccxax caxxxxcaxa aaxxcacx^ x^^- 

^nrr..rrr^ TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAl 
CAGCGTCTTA ATCTAAGCTA TCGCiAi^ii 
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AGCGACGATT TACAGAAGCA AGCnATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTA.ATT AATTTTGTTT TCTTGATGTT 
TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 
TGTAACTTGG TATTCAAAGC AATCA^^GCGA ATCCGTTATT GTITCTCCCG ATGTAAAAGG 
TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 
TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 
TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 
TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 
TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 
GTCTAATACT .TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 
TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 
AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
TTTTTCATTT GCTGCTGGCT CTGAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 
CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTGA .^^^TATTGT CTGTGCCACG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 
TACTGGTCGT GTGACTGGTG AATCTGCGAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 
TCAAAATGTA GGTATTTGCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 
TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 
TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 
CGGTGGCCTC ACTGATTAXA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTGTAA 
AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCGAACGAGG AAAGCACGTT 
ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 
GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 
TCGCTTTCTT CCCTTCCTTT CTCGCGACGT TCGCCGGCTT TCCCCGTCAA GCTGTAAATC 
GGGGGCTCCC TTTAGGGTrC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 
CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 
ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTGTCAGGG 
CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GGCACCCGAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
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,,^Tr irTTrrCiCI GGCCCTCGTT ITACAACGTC 
TIGTGAGCGG AIMCAATn GACACGGGTG AGTTCOCAC ^,,CAAAGA 

...rrrTCCC GTTACCCAAC CTTTGTACAl GGAGAAAATA AAOl 
GTGACTGGGA AAACCGTGGC ^„,™,CC CCTGTGGCAA AAGGCCTTGT 

MGCACIATT GCACTGGCAC TCTTAGCGTT ACTGmA C 
OAGCCATCCG GGAGC.GAAG CGGATGACCG TGGXAAG C~ 

— ~ rr: — 

CATACGCArr AAAtXATTGA AAAAGXm ^^^^^^^ 
GCCGGCAGCG ATCOGCCTTG CGAAGA^G - ^^^^^^^^ 

.GGTrrCGGG GAGCAGAAGC GCTGGCGGA^ A ^^^^^^^^^^ 
....GGGTGG XGGTGGCCTG AAACTGG ™ « ^^^^^^^^^^ 
^AACCr AXCGCATTAG GGXCAA « ^^^^^^^^ 
TOTACTCGC TCAGATTTAA IGTTGATGAA AGCTGGCTAG A ^,,„„, 
™AXG GGGtrCCTAX XGGXXAAAAA AXGAGCXGAX XXAACAAAAA XTXAAG^A 
AXTUXGAIl. 1,1. „.,.mGC XIAIAOAAXC TrccxGim 

.xmAAGAA AAXAXXAAGG XnAGAATH AAATAXXXGC m 
TGGGGGIXTX CXGATIATCA ACCGGGGXAC ATATGAXXGA CAXGCIAGXX TI 

iGGGGcrrrr -^riCccA^ tgaccigaia GccrnGXAC 

CGrXGAXCGA TrCXGTXGTI XGCXGGAGAC XCXCAGC ^. 
.XCrCrCAAA AAXAGGXAGC GXGXGGGGCA XXAAm.. A^ GM 
.XAXXGAXGG XGAXXXGAGX CXCXCGGGCC XTTCXG., =C 

.xxagxgagg GArrGGAxrx a^.xaxaxg agggxxgx^ 

^XAAAGGG XTCXGGCGGA AAAGXAXXAG AGGGX^ XG^ 
XAOGXXIAIG GXCIGAGGCX TTATIGCnA ATmG.TAA TIGXXXGCCX 

ATTTATIGGA CGTT 

(2) INFORMATION FOR SEQ ID NO: 7: 

Ml SEQUENCE CHARACTERISTICS: 
(M LENGTH: 37 base paxrs 
fB") TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 
GATCCTAGGC TGAAGGCGAT GACCCTGCTA AGGCTGC 

(2) INFORMATION FOR SEQ ID NO: 8: 

SEQUENCE CHARACTERISTICS: 
(A) 1£NGTH: 35 base pairs 
TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ATTCAATAGT TTACAGGCAA GTGCTACTGA GTACA 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTGGCTACGC TTGGGCTATG GTAGTAGTTA TAGTT 



(2) INFORMATION FOR SEQ ID NO: 10; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY': linear 



(xi) SEQUExNCE DESCRIPTION: SEQ ID NO: ID: 
GGTGCTACCA TAGGGAITAA ATTATTCAAA AA'^TT 



(2) INFORMATION FOR SEQ ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TACGAGCAAG GCTTCTTA 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGCTTAAGAA GCCTTGCTCG TAAACTTTTT GAATAATTT 



39 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AATCCCTATG GTAGCACCAA CTATAACTAC TACCAT 

(2) INFORMATION FOR SEQ ID NO: 14: 

Ci^ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



34 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14; 
AGCCGAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCTGTAAACT ATTGAATGCA GCCTTAGCAG GGTC 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

16 

ATCGCCTTCA GCCTAG 

(2) INFORMATION FOR SEQ ID NO: 17: 

fi-) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi.) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CTCGAATTCG TACATCCTGG TCATAGC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CATTmGCA GATGGCTTAG A 



(2) INFORiHATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
(G) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIFIION: SEQ ID NO: 19: 
TAGCATTAAC GTCCAATA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATATATTTTA GTAAGCTTGA TGTTCT 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GACAAAGAAC GCGTGAAAAC TTT 
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(2) INPORMATION FOR SEQ ID NO: 22; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GCGGGCCTCT TCGCTATTGC TTAAGAAGCC TTGCT 



(2) INF0R14ATI0N FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
nCAGCCTAG GATCCGCCGA GCTCTCCTAC CTGCGAATTC GTACATCC 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



4S 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TGGATTATAC TTCTAAATAA TGGA 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TAACACTCAT TCCGGATGGA ATrCTGGAGT CTGGGT 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AATTCGCCAA GGAGACAGTC AT 22 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT 39 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT 39 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 39 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TCTAGAACGC GTC 13 
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(2) INTORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUEx^JCE DESCRIPTION: SEQ ID N0:31: 
ACGTGACGCG TTCTAGAATT AACACTCATT CCTGT 



(2) INFORMATION FOR SEQ ID NO: 32: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCCGCTGCC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GTAGGCAATA GGTATTTCAT TATGACTGTC ClUGGCG 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
TGACTGTCTC CTTGGCGTGT GAAATTGTTA 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CAATTTTATC CTAAATCTTA GCAAC 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 38: 
CATTTTTGCA GATGGCTTAG A 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CGAAAGGGGG GTGTGCTGCA A 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:40: 
TAGCAnAAC GTCCAATA 



(2) INFORMATION FOR SEQ ID NO: 41: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AAACGACGGC CAGTGCCAAG TGACGCGTGT GA.^TTGTTA ICC 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 
GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH:' 36 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GGCGTTACCC AAGCTTTGTA CATGGAGAAA ATAAAG 
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(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES5; : ?;inglf- 

(D) TOPOLOGY': linear 



(xi) SEQUEt^CE DESCRIPTION: SEQ ID NO: 44: 
TGAAACAAAG CACTATTGCA CTGGCACTCT TACCGTTACC GT 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 
TACTGTTTAC CCCTGTGACA AA.^GCCGCCC AGGTCCAGCT GC 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS' 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG 



(2) INFORMATION FOR SEQ ID NO; 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG 



(2) INFORMATION FOR SEQ ID NO: 48: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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Ui) SEQUENCE DESCRIPTION: SEQ ID NO:48: 
GGCACAATAG GCCTGACTCG AGCaGCTGGA CCAGGGCGGC TT 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:49: 
TTGTCACAGG GGTAAACAGT AACGGTAACG GTAAGTGTGC CA 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOG"/: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GTGCAATAGT GCTTTGTTTC ACTTTATTTT CTCCATGT.VC PJi 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TAACGGTAAG AGTGCCAGTG C 
(52) INFORMATION FOR SEQ ID NO; 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 68 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc^dif f erence 

it! sss'jmSohPJ™;;! ■« .^^3 a» equal 

MIXTURE OF A AND C AT THIS LOCATION AND AT 
LOCATIONS 28. 31. 34. 37, 40, 43, 46 & 49 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
AGCTCCCGGA TGCCTCAGAA GATGMNNMN^^ MNNMNNMNNM mmJM^m^l NGGCTTTTGC 
CACAGGGG 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc^dif ference 

(D) oraER^INFOmnONP/^^ "M REPRESENTS AN EQUAL 
MIXTURE OF A AND C AT THIS LOCATION AND A. 
LOCATIONS 20. 23, 26, 29, 32. 3d. 38, 41, 44 & 50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
CAGCCTCGGA TCCGCCMNNM NNMNNMNNMN NMNNHNNMNN MNNMNNATGM GAAl 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHA^CTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54; 
GGTAAACAGT AACGGTAAGA GTGCCAG 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GGGCTTTTGC CACAGGGGT 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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SEQUENCE description: SEQ ID !.0:56: 
.caOTC^tCO cmCGCTC CGCAtCCCIC MCCOCCC.X .CG.tnTCC 

CAC 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDtfESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
TCGCCTTCAG CTCCCGGATG CCTCAGAAGC ATGAACCCCC CATAGGC 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 58: 
CAATTTTATC CTAAATCTTA CCAAG 
(2) INFORMATION FOR SEQ ID NO: 59: 

(L) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xt) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GCCTTCAGCC TCGGATCCGC C 
(2) INFORMATION FOR SEQ ID N0:60: 

(I) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CGGATGCCTC AGAAGCCCCN N 



wo 92/06176 



PCT/US91/07141 



105 

(2) INFORjrtATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNSSS: >;iTiglp. 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CGGATGGCTC AGAAGGGCTT TTGCCACAGG 
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I CLAIM: 



1. A composition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides operationally linked to 
expression elements, said expressible oligonucleotides 
having a desirable bias of random codon sequences 
produced from random combinations of first and second 
oligonucleotide precursor populations having a desirable 
bias of random codon sequences. 

2. The composition of claim 1, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

3. The composition of clai= i, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is biased toward a 
pradetermined sequence. 

4. The composition of claim 1, wherein said 
first and second oligonucleotides having random codon 
sequences have at least one specified codon at a 
predetermined position. 

5. The composition of claim 1, wherein said 
cells are procaryotes. 

6. The composition of claim 1, wherein said 
cells are Z. poli' 
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7. A kit for the preparation of vectors usefuj 
for the expression of a diverse population of random 
peptides from combined first and second oligonucleotides 
having a desirable bias of random codon sequences, 
5 comprising: two vectors: a first vector having a cloning 
site for said first oligonucleotides and a pair of 
restriction sites for operationally combining first 
oligonucleotides with second oligonucleotides; and a 
second vector having a cloning site for said second 
10 oligonucleotides and a pair of restriction sites 

complementary to those on said first vector, one or both 
vectors containing expression elements capable of being 
operationally linked to said combined first and second 
oligonucleotides . 

8. The kit of claim 7, wherein said vectors 
are in a filamentous bacteriophage. 

9. The kit of claim 8, wherein said 
filamentous bacteriophage are M13 . 

10. The kit of claim 7, wherein said vectors 
are plasmids. 

11. The kit of claim 7, wherein said vectors 
are phagemids. 

12. The kit of claim 7, wherein the desirable 
bias of random codon sequences of said first and second 
oligonucleotides is unbiased. 

13. The kit of claim 7, wherein the desirable 
bias of random codon sequences of said first and second 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 



i 
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14. The kit of clain 7, wherein said first and 
second oligonucleotides having a desirable bias of randoD 
codon sequences have at least one specified codon at a 
predetermined position. 

15. The kit of claim 7, wherein said pair of 
restriction sites are Fok I. 

16. A cloning system for expressing random 
peptides from diverse populations of combined first and 
second oligonucleotides having a desirable bias of randon 
codon sequences, comprising: a set of first vectors 

5 having a diverse population of first oligonucleotides 
having a desirable bias of random codon sequences and a 
set of second vectors having a diverse population of 
second oligonucleotides having a desirable bias of rando- 
codon seq^Jences, said first and second vectors each 
10 having a pair of restriction sites so as to allow the 
operational combination of first and second 
oligonucleotides into a contiguous oligonucleotide having 
a desirable bias of random codon sequences, 

17 . The cloning system of claim 16 , vherein 
the desirable bias of random codon sequences of said 
first and second oligonucleotides is unbiased. 

18. The cloning system of claim 16, wherein 
the desirable bias of random codon sequences of said 
first and second oligonucleotides is diverse but biased 
toward a predetermined sequence. 

19. The cloning system of claim 16, wherein 
said first and second oligonucleotides having a desirable 
bias of random codon sequences have at least one 
specified codon at a predetermined position. 
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20. The cloning system of claim 16, wherein 
said combined first and second vectors is through a pair 
of restriction sites. 

21. The cloning rystem of claim 16, wherein 
said pair of restriction sites are Fok I. 

22. A composition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides operationally linked to 
expression elements, said expressible oligonucleotides 

5 having a desirable bias of random codon sequences. 

23. The composition of claim 22, wherein said 
cells are procairyotes • 

24. The composition of claim 22, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage. 

25. The composition of claim 22, wherein said 
filamentous bacteriophage is M13 . 

26. The com.position of claim 22, wherein said 
fusion protein contains the product of gene VIII. 

27. The composition of claim 22, wherein said 
diverse population of oligonucleotides having a desirable 
bias of random codon sequences are produced from the 
combination of diverse populations of first and second 

5 oligonucleotides having a desirable bias of random codon 
sequences. 



wo 92/06176 



110 



PCr/l3S91/0714l 



23. The composition of clain 22, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

29. The composition of claim 22, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 

30. The composition of claim 22, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

31. A plurality of vectors containing a 
diverse population of expressible oligonucleotides having 
a desirable bias of random codon seq^aences, 

32. The vectors of claim 31, wherein said 
oligonucleotides are expressible as fusion proteins on 
the surface of filamentous bacteriophage. 

33. The vectors of claim 31, wherein said 
filamentous bacteriophage is M13. 

34. The vectors of claim 31, wherein said 
fusion protein contains the product of gene VIII. 

35. The vectors of claim 31, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

36. The vectors of claim 31, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 
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37. The vectors of claiu 31, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

38. A composition of matter, comprising a 
diverse population of oligonucleotides having a desirable 
bias of random codon sequences produced from random 
combinations of two or more oligonucleotide precursor 

5 populations having a desirable bias of random codon 
secmences = 



39. A method of constructing a diverse 
population of vectors having combined first and second 
oligonucleotides having a desirable bias of random codon 
sequences capable of expressing said combined 
5 oligonucleotides as random peptides, comprising the steps 
of: 



(a) operationally linking sequences from a 
diverse population of first 
oligonucleotides having a desirable bias 
10 of random codon sequences to a first 

vector; 



(b) operationally linking sequences from a 
diverse population of second 
oligonucleotides having a desireible bias 
15 of random codon sequences to a second 

vector; and 



(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
20 oligonucleotides are joined together into 

a population of combined vectors capcible 
of being expressed. 
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40. The method of claim 39, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

41. The method of claim 39, wherein the 
desirable bias of random codon secjuences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 

. 42. The method of claim 39, wherein said first 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codon 
at a predetermined position. 

43. The method of claim 33, wherein steps (a) 
through (c) are repeated two or more times. 
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44, A method of selecting a peptide capable of 
being bound by a ligand binding protein from a population 
of random peptides, comprising: 

(a) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 



(b) operationally linking a diverse population 
of second oligonucleotides having a 
10 desirable bias of random codon sequences 

to a second vector; 



(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 

15 oligonucleotides are joined together into 

a population of combined vectors; 

(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing said 

20 population of random peptides; and 

(e) determining the peptide which binds to 
said ligand binding protein. 



45, The method of claim 44, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

46. The method of claim 44, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 
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47. The method of claiE 44, wherein said first 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codon 
at a predetermined position. 

48. The method of cl-^ia 44, wherein steps (a) 
through (c) are repeated two or more times. 
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49. A method for determining the nucleic acid 
sequence encoding a peptide capable of being bound by a 
ligand binding protein which is selected from a 
population of random peptides, comprising; 

5 (a) operationally linking a diverse population 

of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(b) operationally linking a diverse population 
10 of second oligonucleotides having a 

desirable bias of random codon sequences 
to a second vector; 

(c) combining the vector products of steps (a) 
and (b) under conditions where said 

15 populations of first and second 

oligonucleotides are joined together into 
a population of combined vectors; 

(d) introducing said population of combined 
vectors into a compatible host under 

20 conditions sufficient for expressing said 

population of random peptides; 

(e) determining the peptide which binds to 
said ligand binding protein; 

(f) isolating the nucleic acid encoding said 
25 peptide; and 

(g) sequencing said nucleic acid- 
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50. The method of claim 49, wherein the 
^desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

51. The method of claim 49, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 

52. The method of claim 49, wherein said first 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codon 
at a predetermined position. 

53. The method of claim 49, wherein steps (a) 
through (c) are repeated rwo or more times. 

54. A method of constructing a diverse 
population of vectors containing expressible 
oligonucleotides having a desirable bias of random codon 
sequences, comprising operationally linking a diverse 
population of oligonucleotides having a desirable bias of 
random codon sequences to expression elements, 

55. The method of claim 54, wherein said 
oligonucleotides are expressible as fusion proteins on 
the surface of filamentous bacteriophage. 

56. The method of claim 54, wherein said 
filamentous bacteriophage are M13 . 

57. The method of claim 54, wherein said 
fusion protein contains the product of gene VIII. 
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58. The method of claim 54, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

59. The method of claim 54, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 

60. The method of claim 54, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

61. The method of claim 54, wherein said 
operationally linking further comprising the steps of: 

(a) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(b) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 

(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
a population of combined vectors. 



62. The method of claim 61, wherein steps (a) 
through (c) arc repeated two or more times. 



wo 92/06176 



PCr/DS9l/0714l 



118 

63. A method of selecting a peptide capable of 
being bound by a binding protein from a population of 
random peptides, comprising: 

(a) operationally linking a diverse population 
5 of oligonucleotides having a desirable 

bias of random codon sequences to 
expression elements; 

(b) introducing said population of vectors 
into a compatible host under conditions 

2_o sufficient for expressing said population 

of random peptides; and 

(c) determining the peptide vhich binds to 
said iigand binding protein. 

64. The method of claim 63, wherein said 
population of random peptides are expressed as fusion 
proteins on the surface of filamentous bacteriophage. 

65. The method of claim 63, wherein said 
filamentous bacteriophage are M13 - 

66. The method of claim 63, wherein said 
fusion protein contains the product of gene VIII. 

67. The method of claim 63, wherein the 
desiraible bias of random codon sequences of said 
oligonucleotides is \inbiased. 

68. The method of claim 63, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 
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69. The method of claim 63, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

70. The method of claim 63, wherein step (a) 
further comprises: 

(al) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(a2) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 

(a3) comiDining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
a population of combined vectors, 

71. The method of claim 70, wherein steps (al) 
through (a3) are repeated two or more times. 
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72. A method of determining the nucleic acid 
sequence encoding a peptide capable of being bound by a 
ligand binding protein which is selected from a 
population of random peptides, comprising: 

(a) operationally linking a diverse population 
of oligonucleotides having a desirable 
bias of random codon sequences to 
expression elements. 

(b) introducing said population of vectors 
into a compatible host under conditions 
sufficient for expressing said population 
of random peptides; 

(c) determining the peptide which binds to 
said ligand binding protein; 



15 (d) 



isolating the nucleic acid encoding said 
peptide; and 



(e) sequencing said nucleic acid. 

73. The method of claim 72, wherein said 
population of random peptides are expressed as fusion 
proteins on the surface of filamentous bacteriophage. 

74. The method of claim 72, wherein said 
filamentous bacteriophage are M13 . 

75. The method of claim 72, wherein said 
fusion protein contains the product of gene VIII. 

76. The method of claim 72, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 
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77. The method of claim 72, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 

78. The method of claim 72, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

79. The method of claim 72, wherein step (a) 
further comprises: 

(al) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(a2) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 

(a3) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
a population of combined vectors. 

80. The method of claim 78, wherein steps (al) 
through (a3) are repeated two or more times. 

81. A vector comprising two copies of a gene 
encoding a filamentous bacteriophage coat protein, both 
copies encoding substantially the same amino acid 
sequence but having different nucleotide sequences « 
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82. The vector of claim 81, wherein said 
filamentous bacteriophage is M13 . 

83. The vector of claim 81, wherein said gene 
is gene VIII. 

84. The vector of claim 81, wherein said 
vector has substantially the sequence shown in Figure 5 
(SEQ ID NO: 1) . 

85. A vector comprising two copies of a gene 
encoding a filamentous bacteriophage coat protein, one 
copy of said gene capable of being operationally linlced 
to an oligonucleotide wherein said oligonucleotide can be 

5 expressed as a fusion protein on the surface of saxd 
filamentous bacteriophage or as a soluble peptide. 

86. The vector of claim 34, wherein said one 
copy of said gene is expressed on -.he surface of said 
filamentous bacteriophage. 

37. The vector of claim 84, wherein said 
bacteriophage coat protein is H13 gene VIII. 
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1 AAT6CTACTA CTATTAGTA6 AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAA/!iT 60 
61 ATAGCTAAAC A66TTATTGA CCATTTGC6A AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCA6A ATTGGGAATC AACTGHACA TG6AATGAAA CTTCCAGACA CCGTAcflfA 180 
181 GTTGCATATT TAAAACAT6T T6AGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2S0 
2^1 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG lACTCTCTAA TCCTGACCTG 300 
30 TTG6AGTTTG CTTCCGGTCT GGTTCGCm GAAGCTC6AA TTAAAACGC6 ATATTT6AAG 360 
351 TCTTTCGGGC TTCCTCTTAA TCTTTTT6AT GCAATCCGCT TTGCTTCT6A CTATAATAGT H20 
421 CA66GTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA JsO 
m TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCA6TCT 540 
^^ACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTT6 CAAAAGCCTC TCGCTATITT 600 
601 66TTTTTATC GTC6TCTGGT AAACGAGGGT TATGATAGT6 TT6CTCTTAC TAT6CCTCGT 650 
651 AATTCCTTTT GGCGHATGT ATCTGCATTA GTTGAATGT6 6TATTCCTAA mJCMCTS 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC 6TTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATC6CATA AGGTAATTCA 840 
841 CAAT6ATTAA AGHGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGITT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTT6 TTACGTTGAT TTGGGTAATG 960 
951 AATATCCGGT TCTTGTCAA6 ATiACTCITG ATGAAGGTCA GCCAGCC7AT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCn ATGATTGACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AA6TAACATG GAGCAGGTCG CG6ATTTCGA CACAATTTAT 1140 
1141 CAGGC6ATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCT6GGGGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTC6T nTAGGTTGG TGCCTTCGTA 1250 
1251 6TGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 
l^^l TGCGTGGGCG ATGGTTGTTG TCATTGTCGG C6CAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT G6AGCCTTTT 1550 
1561 TTTTTGGAGA TTTTCAACGT 6AAAAAATTA HAnCGCAA TTCCTTTAGT TGTTCCmC 1520 
1621 TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGHTAGCAA AACCCCATAC A6AAAATTCA 1580 
1681 TTTACTAACG TCTGGAAA6A CGACAAAACT TTAGATCGTT ACGCTAACTA T6AGGGTTGT 1740 
1741 CTGTGGAAT6 CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 
1801 TG6GTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG 6TGGCTCTGA GGGTGGCGGT 1850 
.1861 TCTGA6GGTG 6CGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 ATTCCG66CT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC HCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGrTT 2040 
2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG HTATACGGG CACTGHACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAA6CCATG 2150 
2161 TATGACGCTT ACTGGAACG6 TAAATTCAGA GACTGCGCTT TCCATTCTGG CHTAATGAA 2220 
2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TCGGTCAACC TCCT6TCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT 6GCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 
2341 GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCC6GT 2400 
2401 GATTTTGAH ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACC6A AAATGCCGAT 2450 
2451 GAAAACGCGC TACAGTCTGA C6CTAAAGGC AAACHGAH CTGTCGCTAC TGAHACGGT 2520 
2521 GCTGCTATCG ATGGHTCAT TGGT6ACGTT TCC6GCCTTG CTAATGGTAA TGGT6CTACT 2580 
2581 6GTGATTTTG CT6GCTCTAA TTCCCAAATG GCTCAAGTCG GTGACG6TGA TAATTCACCT 2640 
2541 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTTGTCnTA GCGCTGGTAA ACCATATGAA TTTTCTAnG ATT6TGACAA AATAAACHA 2760 
2751 TTCCGT6GTG TCTTTGCGTT TCTTTTATAT GHGCCACCT TTATGTATGT AHTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTHG GGTAHCCGT 2880 
2881 TAHATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT GCCGTATCTG CnACTTTTC 2940 
2941 TTAAAAAG6G CTTCGGTAAG ATA6CTATTG CTATHCATT GTncnGCT CTTAHATTG 3000 
3001 GGCHAACTC AAHCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 
3061 TTGTTCAGGG TGTTCA6TTA ATTCTCCCGT CTAAT6CGCT TCCCTGriTT TATGHAHC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCATTTTTG AC6TTAAACA AAAAATCGTT TCnATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATrTT GTAACTGGCA AAHAGGCTC TG6AAAGACG 3240 
3241 CTCGHAGCG HGGTAAGAT TCAGGATAAA AHGTAGCTG G6TGCAAAAT AGCAACTAAT 3300 
3301 CCTGATTTAA GGCHCAAAA CCTCCC6CAA 6TCGGGAGGT TC6CTAAAAC GCCTC6C6TT 3360 
3361 CTTA6AATAC CGGATAAGCC TTCTATATCT GAHTGCTTG CTAHGGGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCTT 6TTCTCGAT6 A6TGCGGTAC nGGriTAAT 3480 
3481 ACCCGHCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTHCT ACATGCTCGT 3540 
3541 AAATTAGGAT GG6ATATTAT CTTCCTT6TT CAGGACTTAT CTATTGHGA TAAACAGGCG 3600 
3601 CGTTCTGCAT TA6CTGAACA TCTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTHACCT 3650 
3651 TTTGTCGGTA CTTTATATTC TCTTATTACT G6CTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAA6A ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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1 AATGCTACTA CTATTAGTA6 AATTGAT6CC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
51 ATAGCTAAAC AGGTTATTGA CCATHGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 C6TTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACAT6T 7GAGCTACA6 CACCAGATTC AGCAAHAAG CTCTAAGCCA 2^0 
2iil TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATHGAAG 350 
361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT m 
m CAGG6TAAAG ACCTGATTTT TGATTTAT66 fCATTCTCGT TTTCTGAACT GHTAAAGCA m 
m TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCA6 TATTGGACGC TATCCAGTCT 5^0 
Sm AAACATTTTA CJf\Uf\CCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTH 600 
601 G6TTTTTATC 6TCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 550 
551 AATTCCTin GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AG6TAATTCA 8W 
8m CAATGAHAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTC6TCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 950 
961 AATAKC6GT TCTTGTCAAf, ^TiLCTmr. AT6,ii,flGGTCA GCCAGCCTAT GCGCCTGGTC 1020 
1021 T6TACACCGT TCATCtGTCC TCttTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGHCCGGCT AAGTAACATG GAGCAG6TCG CGGATTTCGA CACAATTTAT 11^0 
1111 CAG6CGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTG6GG6T 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTHCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1250 
1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCC6TTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACC6AAT ATATCGGTTA im 
mi TGCGTGGGCG ATGGTTGTTG TCATTGTC6G CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 2550 
1551 mTTGGAGA TTTTCAAC6T GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCniC 1520 
1621 TATTCTCACT CCGCTGAAAC TGTT6AAAGT TGTTTAGCAA AACCCCATAC A6AAAATTCA 1580 
1581 TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGG6TTGT 1740 
1741 CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTAC6GTACA 1800 
1801 TGGGTTCCTA nGGGCTTGC TATCCCTGA.^ AATGAGG6TG GTGGCTCTGA GGGTGGCGGT 1860 
1851 TCTGAGGGTG GCGGHCTGA GGGTGGCGGT ACTAAACCTC CT6AGTACGG TGATACACCT 1920 
1921 AHCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGA6CAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGnT 2040 
2041 CA6AATAATA G6TTCCGAAA TAGGCAG6GG GCATTAACTG THATACGGG CACTGHACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CT6TATCATC AAAAGCCATG 2160 
2151 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCATTCG THGTGAATA TCAAG6CCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT G6CGGCTCTG AGGGTGGTGG CTCTGAG6GT 2340 
2341 GGCG6TTCTG AGGGTGGCGG CTCTGAGGGA GGCGGHCCG 6TGGTGGCTC TGGHCCGGT 2400 
2401 GATTTTGATT ATGAAAAGAT 6GCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2450 
2451 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGAHACGGT 2520 
2521 GCTGCTATCG ATGGTTTCAT TGGTGACGH TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAA6TCG GT6ACGGTGA TAAHCACCT 2640 
2641 HAATGAATA ATHCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTT6A ATGTCGCCCT 2700 
2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG AHGIGACAA AATAAACHA 2750 
2761 TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GHGCCACCT TTAT6TATGT ATTTTCTACG 2820 
2821 THGCTAACA TACTGCGTAA TAAGGAGTCT TAATCAT6CC AGTTCTTTTG GGTAHCCGT 2880 
2881 TATTAHGCG TTTCCTCGGT TTCCTTCTGG TAACTHCTT CGGCTATCTG cnACnTTC 2940 
2941 TTAAAAAGG6 CHCGGTAAG ATAGCTATTG CTATTTCATT GTTTCnGCT CnATTAnG 3000 
3001 GGCHAACTC AAHCTTGTG GGHATCTCT CTGATATTAG C6CTCAATTA CCCTCT6ACT 3060 
3051 TTGTTCAGGG TGHCAGTTA AnCTCCCGT GTAAT6CGCT TCCCTGTTTT TATGnATTC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCATTTTTG ACGHAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTrTATTTT GTAACT6GCA AATTAGGCTC T6GAAA6ACG 3240 
3241 CTCGTTAGCG nGGTAAGAT TTA6GATAAA AHGTAGCTG GGT6CAAAAT AGCAACTAAT 3300 
3301 CTTGAHTAA 66CTTCAAAA CCTCCC6CAA 6TCG6GAGGT TCGCTAAAAC GCCTCGCGH 3360 
3351 CHAGAATAC CG6ATAAGCC TTCTATATCT GATTT6CTTG CTATTGGGC6 CGGTAATGAT 3420 
3421 TCCTAC6ATG AAAATAAAAA CGGCTTGCTT GTTCTC6ATG AGT6CGGTAC nGGTTTAAT 3480 
3481 ACCCGnCTT 6GAATGATAA GGAAAGACAG CCGATTATTG AnGGTTTCT ACAT6CTCGT 3540 
3541 AAAHAGGAT GGGATATTAT CnCCTTGH CAGGACTTAT CTAHGHGA TAAACAGGCG 3600 
3601 C6TTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3550 
3551 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAAHACAT 3720 
3721 GTTGGCGTT6 HAAATATGG CGATTCTCAA TTAAGCCCTA CTGTT6AGCG TTGGCTTTAT 3780 
3781 ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAAHATGAT 3840 
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1 AAT6CTACTA CTATTAGTA6 AATTGAT6CC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATA6CTAAAC AGGTTAHGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGG6AATC AACTGHACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACAT6T TGAGCTACA6 CACCAGATTC AGCAATTAAG CTCTAAGCCA 2J0 
2?i TCT6CAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TTG6AGTTTG CTTCCGGTCT GGTTCGCm 6AAGCTCGAA HAAAACGCG ATATTTGAAG 360 
351 TCTTTCGG6C HCCTCTTAA TCTTTTTGAT GCAATCC6CT TT6CTTCTGA CTATAATAGT 420 
m CAG6GTAAA6 ACCTGATTTT T6ATTTATGG TCATTCTC6T TTTCT6AACT GTTTMAGCA 480 
481 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTG6ACGC TATCCAGTCT 540 
541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACHCTTTTG CAAAAGCCTC TCGCTATTTT 600 
501 G6TTTTTATC 6TCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 650 
551 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTH CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATC6CATA AG6TAATTCA 840 
841 CAATGATTAA AGHGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTG6TGTTT 900 
901 CTC6TCAG6G CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TT6GGTAATG 950 
951 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAA6GTCA 6CCAGCCTAT GC6CCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT AT6ATTGACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 
1141 CAGGCGAT6A TACAAATCTC CGTTGTACTT TGTTTCGCGC HGGTATAAT CGCTGGGGGt 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTHCG CCTCTTTCGT TTTAGOnGG JGCCTTCGTA 1250 
1251 GTGGCAHAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCi 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCT6 CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAi ATATCGGHA 1440 
1441 TGCGTGGGCG ATGGTTGnG TCAHGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 AHCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT 6GAGCCTT7T 1550 
1561 TTTTTGGA6A THTCAACGT GAAAAAATTA HATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 
1521 TATTCTCACT CCGCT6AAAC TGTTGAAAGT TGTHAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 TTTACTAACG TCTGGAAAGA C6ACAAAACT HAGATCGIT ACGCTAACTA I6AGG6TT6T 1740 
174 CTGT6GAATG CTACAG6CGT TGTAGniGT ACTG6TGAC6 AAACTCAGTG JJACGGTACA 1800 
1801 TGGGnCCTA TTG6GCTTGC TATCCCTGAA AAT6AGGGTG GTGGCTCTGA GGGTGGCG6T 1850 
1851 TCTGA6GGTG GCGGTTCTGA 6GGT6GCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACT6AGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 
2041 CA6AATAATA GGHCCGAAA TAGGCA6GGG 6CATTAACTG TTTATACGGG CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2150 
2161 TATGAC6CTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCT6G CTTTAATGAA 2220 
2221 GATCCAHCG TTTGTGAATA TCAAG6CCAA TCGTCTGACC T6CCTCAACC TCCTGTCAAT 2280 
2281 GCT6GCGGCG GCTCTGGTGG TGGnCTGGT GGCGGCTCTG AGGGTGGT6G CTCTGAGG6T 2340 
2341 GGCGGHCTG AGGGTGGCGG CTCTGAGGGA GGCG6TTCCG GTGGTGGCTC TG6TTCCGGT 2400 
2401 GATHTGAn ATGAAAA6AT GGCAAACGCT AATAA66GG6 CTATGACCGA AAATGCCGAT 2450 
2461 GAAAACGC6C TACAGTCTGA CGCTAAAGGC AAACHGAH CTGTC6CTAC TGATTACGGT 2520 
2521 6CTGCTATCG AT6GTTTCAT TGGTGACGTT TCC6GCCTT6 CTAAT6GTAA TGGT6CTACT 2580 
2581 GGTGAnnG CT6GCTCTAA TTCCCAAATG 6CTCAAGTCG GTGAC6GTGA TAATTCACCT 2540 
2641 TTAAT6AATA ATHCCGTCA ATATTTACCT TCCCTCCCTC AATCG6TTGA ATGTCGCCCT 2700 
2701 TTTGTCnTA GCGCTG6TAA ACCATATGAA TTTTCTATTG ATT6TGACAA AATAAACTTA 2750 
2751 nCCGTGGTG TCHTGCGn TCTTTTATAT GHGCCACCT TTATGTATGT ATTTTCTAC6 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGA6TCT TAATCAT6CC AGTTCTTTTG 66TATTCCGT 2880 
2881 TAHAnGCG THCCTCGGT HCCnCIGG TAACTHGn C66CTATCTG CnACTTTTC 2940 
2941 TTAAAAAGG6 CTTCGGTAAG ATAGCTAHG CTATTTCATT 6TTTCTTGCT CTTATTATT6 3000 
3001 GGCTTAACTC AATTCTTGTG GGHATCTCT CTGATAHAG CGCTCAATTA CCCTCTGACT 3060 
3051 TT6TTCAGGG TGTTCAGnA AHC TCCCG T CTAATGCGCT TCCCTGTTTT TAT^TTATTC 3120 
3121 TCTCT6TAAA GGCTGCTAH nCATnTTG ACGTTAAACA AAAAATCGTT TCnATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AAHAGGCTC TGGAAA6ACG 3240 
3241 CTCGTTAGC6 TTG6TAAGAT TCAGGATAAA AHGTAGCTG 66TGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCHCAAAA CCTCCC6CAA GTC6GGAG6T TCGCTAAAAC GCCTC6C6TT 3360 
3351 CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCnG CTATTGG6CG CGGTAATGAT 3420 
342 TCCTACGATG AAAATAAAAA CG6CTTGCTT GTTCTCGATG A6TGCG6TAC TTG6TTTAAT 3480 
3481 ACCCGTTCTT GGAATGATAA GGAAAGACA6 CCGATTATTG ATTG6TTTCT ACATGCTCGT 3540 
3541 AAATTA66AT GGGATAHAT TTTTCnGn CAGGACHAT CTAnGTTGA TAAACA6GC6 3600 
3601 CGTTCTGCAT TAGCTGAACA TGTTGniAT TGTCGTCGTC TGGACA6AAT TACTTTACCT 3560 
3661 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA JGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGHG HAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG nGGCTTTAT 3780 
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1 AAT6CTACTA CTATTAGTAG AATTGATGCC ACCTiTTCAG CTCGCGCCCC AAATGAAAAT 50 
61 ATAGCTAAAC AG6TTATTGA CCATHGCGA AATGTATCTA ATGGTCAAAC JAAATCTACT 120 
121 CGTTCGCAGA ATTGG6AATC AACTGTTACA TGGAATGAAA CTJCCAGACA CC6TACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2^0 
2m TCT6CAAAAA T6ACCTCTTA TCAAAA6GAG CAAHAAAGG TACTCTCTAA TCCTGACCT6 300 
301 TTGGAGTHG CTTCCGGTCT GGTTCGCm 6AAGCTCGAA HAAAACGCG ATATTTGAA6 350 
361 TCTTTCGGGC TTCCTCHAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT ^20 
^21 CAGGGTAAAG ACCTGATTTT TGATTTATGb TCATTCTCGT TTTCTGAACT GTTTMAGCA J80 
m TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG JATTGGACGC TATCCAGTCT 5^0 
AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTT6 CAAAAGCCTC TCGCTATTTT 600 
601 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TAT6ATAGTG TTGCTCTTAC TATGCCTCGT 650 
661 AATTCCHTT GGCGTTATGT ATCTGCATTA GTT6AATGT6 6TATTCCTAA ATCTCAACT6 720 
721 ATGAATCTTT CTACCTGTAA TAATGnCTT CCGTTAGTTC GTTTTATTAA CGTA6ATTTT 780 
781 TCTTCCCAAC GTCCTGACT6 GTATAATGAG CCAGTTCTTA MATCGCATA AGGTAATTCA 8^0 
841 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCA6CTTTG TTACGTTGAT TTGGGTAATG 960 
961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT SC6CCTSGTC 1020 
1021 TGTACACC6T TCATCTGTCC TCTHCAAAG TTGGTCAGTT CGCnCCCTT AT6ATT6ACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG 6AGCAGGTCG CG6ATTTC6A CACAATTTAT 11^0 
im CAGGCGATGA TACAAATCTC CGTTGTACn TGinCGCGC ITGGTATAAT CGCTGG6GGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACHCCTC AT6AAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT 6CAAGCCTCA GCGACCGAAT ATATC6GTTA l^l^^O 
mi TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAA6C T6TTTAA6AA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA 6GCTCCTTTT GGAGCCTTTT 1550 
1561 niTTGGAGA HTTCAACGT GAAAAAATTA TTAHCGCAA TJCCllUfil TGTTCCTTTC 1520 
1621 TATTCTCACT CCGCT6AAAC TGTT6AAAGT TGITTAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 TTTACTAACG TCTGGAAA6A CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 17^0 
Um CTGTGGAATG CTACA6GCGT TGTAGTTTGT ACTGGTGAC6 AAACTCAGTG TTACGGTACA 1800 
1801 TGGGTTCCTA TTG66CTTGC TATCCCTCAA AAT6AGGGTG GTGGCTCTGA G6GTG6CGGT 1850 
1861 TCTGA6GGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTAC6G TGATACACCT 1920 
1921 ATTCCG6GCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACT6A6CAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCAT6TTT 20J0 
20m CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCAHAACTG TTTATACGG6 CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGHAA AACTTATTAC CAGTACACTC CTGTATCATC AAMGCCATG 2160 
2151 TATGACGCn ACT6GAACGG TAAAHCAGA GACTGCGCH TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCAHCG THGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC JCCTGTCAAT 2280 
2281 6CTGGCG6CG CGTCTGGTGG TGGnCTGGT GGCGGCTCTG AGGGT6GTGG CKT^AGGGT 23^0 
2341 GGCGGHCTG AG6GT66CGG CTCT6A6GGA GGCGGHCCG GTG6TGGCTC TGGTTCCGGT 2J00 
2^01 GATTTTGAn ATGAAAAGAT GGCAAACGCT AATAA6G6GG CTATGACC6A AAATGCC6AT 2^50 
2m GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACnGATT CTGTC6CTAC T6ATTACGGT 2520 
2521 GCTGCTATCG AIGGHTCAT TGGTGACGn TCCGGCCHG CTAATGGTAA T6GTGCTACT 2580 
2581 GGT6ATTTTG CTGGCTCTAA TTCCCAAATG 6CTCAAGTCG GTGACG6TGA JAATTCACCT 25J0 
mi TTAATGAATA ATTTCC6TCA ATATTTACCT TCCCTCCCTC AATC6GTT6A ATGTCGCCCT 2700 
2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TmCTATTG ATTGTGACAA AATAAACTTA 2750 
2761 TTCC6TGGTG TCTTTGCGn TCTTTTATAT GHGCCACCT TTATGTAT6T ATTTTCTACG 2820 
2821 THGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGnCTTTTG 6GTATTCCGT 2880 
2881 TATTAnGCG TTTCCTCGGT TTCCTTCTGG TAACHTGIT CGGCTATCTG CnACTTTTC 29J0 
293l TTAAAAA6GG CHCGGTAAG ATAGCTAHG CTATTTCAn GTTTCTT6CT CTTATTATTG 3000 
3001 GGCHAACTC AAnCHGTG GGTTATCTCT CTGATAHAG CGCTCAATTA CCCTCT6ACT 3060 
3061 TT6TTCA6GG TGTTCAGnA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCT6TAAA GGCTGCTAH nCATTTTTG ACGHAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGinATTTT GTAACTGGCA AATTAGGCTC TG6AAAGACG 32J0 
32S1 CTCGHAGCG HGGTAAGAT HAGGATAAA ATT6TA6CTG GGTGCAAAAT A6CAACTAAT 3300 
3301 CTTGATTTAA GGCHCAAAA CCTCCCGCAA GTCGGGAGGT JCGCTAAAAC 6CCTCGC6TT 3350 
3351 CTTAGAATAC CGGATAAGCC HCTATATCT GATTT6CTTG CTATT6GGCG C6GTAATGAT 3420 
im TCCTACGATG AAAATAAAAA CGGCTTGCn GHCTCGATG AGTGC6GTAC TT6GTTTAAT 3480 
3481 ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGAHATTG AnGGTTTCT ACATGCTCGT 3540 
3541 AAATTAG6AT 6GGATATTAT TTTTCnGTT CAGGACHAT CTAHGITGA TAAACA6GC6 3600 
3501 CGTTCTGCAT TAGCTGAACA TGnGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3560 
3551 TTT6TCGGTA CTTTATATTC TCTTATTACT GGCTC6AAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTT6GCGTT6 TTAAATATGG CGAHCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAA6A ATTTGTATAA CGCATATGAT .ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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1 AAT6CTACTA CTATTAGTAG AATTGATGCC ACCTTHCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AAT6TATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACTGHACA TGGAATGAAA CHCCAGACA CCGTACHTA 180 
181 GTTGCATATT TAAAACATGT T6AGCTACAG CACCAGAHC AGCAATTAAG CTCTAAGCCA 2^)0 
2^1 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TTGGA6TTT6 CTTCCGGTCT 6GTTCGCTTT GAAGCTC6AA TTAAAACGCG ATATTTGAAG 360 
361 TCTTTCGGGC nCCTCHAA TCTTTTT6AT GCAATCCGCT TTGCTTCT6A CTATMTAGT ti20 
^21 CAGG6TAAAG ACCTGATTTT T6ATTTATGG TCATTCTCGT TTTCT6AACT GTTTAAAGCA ^80 
^81 TTTGAG6GG6 ATTCAATGAA TATTTATGAC GATTCCGCAG TATT6GACGC TATCCAGTCT 5W 
5^1 AAACATTTTA CTATTACCCC CTCTGGCAAA ([CTJCTJllG CAAAAGCCTC TC6CTATTTT 6C0 
601 GGnTTTATC GTCGTCTGGT AAACGAGGGT TAT6ATAGT6 TT6CTCTTAC TAT6CCTCGT 660 
661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GHGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGITGH CCGnAGTTC GTTTTATTAA CGTAGATTTT 7S0 
781 TCnCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCnA AAATC6CATA AGGTAATTCA 840 
8iJl CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAA6CCTTAT TCACTGAATG AGCAGCTHG HACGTTGAT TTGGGTAATG 960 
961 AATATCC66T TCTTGTCAAG ATTACTCHG AT6AAGGTCA GCCAGCCTAT GCGCCTGGTC 1G2G 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG nGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCT6CGCCT CGHCCGGCT AAGTAACATG GAGCAGGTCG CGGAHTCGA CACAAnjAT im 
im CAGGCGATGA TACAAATCTC CGTTGTACn TGnTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGnGG TGCCTTCGTA 1260 
1261 GTGGCAHAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTA6TCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCT6 TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCC6CA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA 6CGACCGAAT ATATCGGHA im 
mm TGCGTGGGCG ATGGTTGHG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAA6CT GATAAACCGA TACAATTAAA GGCTCCHTT GGAGCCTTTT 1560 
1561 TnTTGGAGA mTCAACGT GAAAAAAHA TTATTCGCAA TTCCTTTAGT TGHCCTnC 1620 
1621 TATTCTCACT CCGCTGAAAC TGHGAAAGT TGTTTAGCAA AACCCCATAC AGAAAAHCA 1680 
1681 HTACTAACG TCTGGAAAGA CGACAAAACT HAGATCGIT AC6CTAACTA TGA6GGTTGT 17^0 
Um CTGT6GAATG CTACAGGCGT TGTAGTTTGT ACTG6TGACG AAACTCAGTG HACGGTACA 1800 
1801 TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGC6GT 1860 
1861 TCTGAGGGT6 GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 AHCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 20^10 
20^11 CAGAATAATA GGHCCGAAA TAGGCA6GGG GCATTAACTG TTTATACGGG CACTGHACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACnATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCn ACTGGAACGG TAAAHCAGA GACTGCGCH TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCAHCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGC6 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AG6GTGGTGG CiCTGAGGGT 23il0 
2341 GGCGGHCTG AGGGT6GCGG CTCTGAGGGA GGCGGHCCG GT6DTGGCTC TGGTTCCGGT 25OO 
2401 GATTTTGAn ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2450 
2461 GAAAACGC6C TACAGTCT6A CGCTAAAGGC AAACHGAH CTGTCGCTAC TGAHACGGT 2520 
2521 GCTGCTATCG ATGGnTCAT TGGTGACGH TCCGGCCHG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA HCCCAAATG GCTCAA6TCG GT6ACGGTGA TAAHCACCT 2540 
2541 HAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA AT6TC6CCCT 2700 
2701 TTTGTCnTA GCGCTGGTAA ACCATATGAA TTTTCTAnG AHGTGACAA AATAAACHA 2750 
2761 nCCGTGGTG TCnTGCGH TCTTTTATAT GHGCCACCT HAIGTATGI ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAA6GAGTCT TAATCATGCC AGHCnHG GGTAHCCGT 2880 
2881 TAnAHGCG TTTCCTCGGT nCCnCTGG TAACTTTGTT CG6CTATCTG CTTACTTTTC 2940 
2941 HAAAAAGGG CTTCGGTAA6 ATAGCTATTG CTATTTCAn GTTTCTTGCT CTTATTATT6 3000 
3001 GGCHAACTC AAHCTTGIG GGHATCTCT CT6ATATTAG CGCTCAATTA CCCTCTGACT 3050 
3061 nGTTCAGGG TGnCAGHA AHCTCCCGT CTAAT6CGCT TCCCT6TTTT TATGHAnC 3120 
3121 TCTCTGTAAA GGCTGCTAH nCATTTTTG ACGTTAAACA AAAAATCGH TCnAITTGG 3180 
3181 AHGGGATAA ATAATATGGC TGIHATTn 6TAACTGGCA AATTAG6CTC TG6AAAGACG 3240 
3241 CTCGHAGCG nGGTAAGAT TCAGGATAAA AHGTAGCTG GGTGCAAAAT A6CAACTAAT 3300 
3301 CnGATHAA GGCHCAAAA CCTCCCGCAA GTCGGGAG6T TC6CTAAAAC GCCTCGCGTT 3350 
3361 CHAGAATAC CGGATAAGCC HCTATATCT GATTTGCnG CTAHGGGCG CGGTAATGAT 3420 
3421 TCCTACGAT6 AAAATAAAAA CGGCHGCn GHCTCGATG AGTGC6GTAC nGGTTTAAT 3480 
3481 ACCCGnCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
3541 AAATTAGGAT GGGATATTAT CnCCTTGTT CAGGACTTAT CTAnGTTGA TAAACA6GC6 3500 
3501 CGHCTGCAT TAGCT6AACA TGTTGTTTAT TGTCGTCGTC T6GACAGAAT TACTTTACCT 3660 
3651 TTTGTCGGTA CTTTATAm TCTTAnACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GnCGCGTTG HAAATATGG CGATTCTCAA TTAA6CCCTA C TGTTG AGCG TTGGCTTTAT 3780 
3781 ACTG6TAAGA AHTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAAHATGAT 3840 
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1 AATGCTACTA CTATTAGTAG AAHGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 50 
61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTC6CAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTT6CATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2^0 
2H1 TCT6CAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAG6 TACTCTCTAA TCCTGACCT6 30C 
301 TTGGAGmG CTTCCGGTCT GGTTC6CTTT GAAGCTC6AA HAAAACGCG ATATTTGAAG 360 
351 TCTHCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTCGTTCTGA CTATAATAGT mO 
^21 CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTHAAAGCA ^80 
m TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 5W 
5^1 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 
601 GGTTTTTATC 6TCGTCTGGT AAACGAGGGT TAT6ATAGTG TT6CTCTTAC TATGCCTC6T 660 
661 AATTCCTHT GGCGTTATGT ATCTGCATTA GTTGAAT6TG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGHAGITC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAAHCA 8^10 
8^1 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT nGGGTAATG 960 

± MfirMlov.OUI lUIIUILHrtO H I UMHUU I LH bUtrttJLL I « I bLbLLIbbIt, lOZO 

1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG nGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGHCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT ll^jQ 
im CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC HGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTmAGTG TATTCTTTCG CCTCTTTC6T TTTAGGnGG TGCCTTC6TA 1260 
1261 GTCGCAHAC GTATHTACC CGTTTAATG6 AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCT6 CTGA6GGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA mO 
im TGCGTGGGCG ATGGTTGHG TCATTGTCGG CGCAACTATC G6TATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAA6CAAGCT GATAAACCGA TACAATTAAA GGCTCCTTiT GGA6CCTTTT 1560 
1561 rrrnGGAGA HTTCAACGT GAAAAAAHA TTAHCGCAA nCCTTTAGT TGTTCCTTTC 1520 
1621 TAHCTCACT CCGCTGAAAC TGHGAAAGT TGHTAGCAA AACCCCATAC AGAAAAHCA 1680 
1681 TTTACTAACG TCTGGAAAGA GCACAAAACT HAGATCGIT ACGCTAACTA TGAGGGHGT 17^10 
1741 CTGTGGAATG CTACAGGCGT TGTAGnTGT ACTGGTGACG AAACTCAGTG HACGGTACA 1800 
1801 TGGGTTCCTA nGGGCHGC TATCCCTGmA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 
1851 TCTGAGGGTG GCGGTTCT6A GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCHGAG GAGTCTCAGC CTCHAATAC mCATGin 20'»0 
20^11 CAGAATAATA GGHCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGHACT 2100 
2101 CAAGGCACTG ACCCCGHAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCTT ACTGGAACGG TAAAHCAGA GACTGCGCn TCCAHCIGG CTTTAATGAA 2220 
2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCaCTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTG6 TGGHCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 23^10 

23iii ggcggttctg agggtggcgg ctctgaggga ggcgghccg gtggtggctc TGGnccGGT imo 

2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
2451 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGST 2520 
2521 GCTGCTATCG ATGGTnCAT TGGTGACGH TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2540 
2641 HAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGHGA ATGTCGCCCT 2700 
2701 TTTGTCrrrA GCGCTGGTAA ACCATATGAA TTTTCTAnG AHGTGACAA AATAAACHA 2760 
2761 nCCGTGGTG TCTTTGCGn TCTTTTATAT GHGCCACCT HATGIAIGT ATTTTCTAC6 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCAT6CC AGnCTTTTG GGTAHCCGT 2880 
2881 TAHAHGCG TTTCCTCGGT HCCnCTGG TAACriTGn CGGCTATCTG CnACTTTTC 2940 
2941 HAAAAAGGG CHCGGTAAG ATAGCTAHG CTATHCATT GTITCTTGCT CnATTAHG 3000 
3001 GGCTTAACTC AAHCHGTG GGHATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 
3061 HGnCAGGG TGnCAGHA ATTCTCCCGT CTAAT6CGCT TCCCT6TTTT TATGHAHC 3120 
3121 TCTCTGTAAA GGCTGCTAH nCATTTTTG ACGHAAACA AAAAATC6TT TCnATriGG 3180 
3181 ATT6GGATAA ATAATATGGC T6TTTATTTT GTAACTGGCA AATTAGGCTC T6GAAAGACG 3240 
3241 CTCGnAGCG HGGTAAGAT TTAGGATAAA AHGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGH 3360 
3361 CHAGAATAC CGGATAAGCC TTCTATATCT GATTTGCHG CTAHGGGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCHGCn GHCTCGATG AGTGCGGTAC TTGGTnAAT 3480 
3481 ACCCCncn GGAATGATAA GGAAAGACAG CCGAHATTG AnGGTTTCT ACAT6CTC6T 3540 
3541 AAATTAGGAT GGGATAHAT TTTTCnGTT CAGGACTTAT tTAHGHGA TAAACAGGCG 3500 
3501 CGHCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3561 TTT6TCGGTA CHTATATTC TCTTAHACT 6GCTCGAAAA TGCCTCTGCC TAAAHACAT 3720 
3721 GTTGGCGnG HAAATATGG CGATTCTCAA HAAGCCCTA CT6TTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAAHATGAT 3840 
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